I'm developing a site that is heavily dynamic and uses a MySQL database constantly. My question is - should I worry about the load on the database?
For example, a part of the site has a live chat which uses AJAX to contact the database every second for each user. Depending on how many users are connected, that's a lot of queries!
Is this something a MySQL database can handle, or am I pushing it? Thanks.
You are actually pushing it. Depending on your server and online users count MySQL can handle at some point.
MySQL and other database management systems are data storage systems, and you are not actually storing the data! You are just sending data between clients through MySQL and that is not efficient.
But to speed things up, you can use MySQL Memory Tables for instant messages and keep offline messages in another MyISAM or InnoDB table (which will be storing the data)
But the best way to have a chat infrastructure is having a backend application which keeps all the messages in the memory, and after some limit sending not received messages to the MySQL as offline messages. This is very much like MySQL Memory Tables but you will have more control over the data. The problem with this is you need to implement logical and efficient data structures with good memory management, which is a very hard task if you are not doing a commercial product and unnecessary if you are not thinking about selling that chat system so I recommend use MySQL Memory Tables as I described.
Update
Mysql Memory Tables are volatile (will be reset on service/server restart), so don't use it for storing, use only for keeping data in a short time for instant messages.
Related
I'm developing a high traffic ad serving platform for some years now, using a master-master Maria DB cluster with an HAProxy in front for balancing relational data queries (read queries go to all of the servers, but writes only go to one, to prevent the servers from going out of sync). By relational data I mean things like campaign settings, user details, payments. I'm also using Redis for caching some of the less dynamic MySQL information, but I believe there are a lot of opportunities to make better use of it, since as soon as the traffic increases, I'm frequently hitting bottlenecks like:
too many connections to MySQL
deadlocks (possibly because writes start coming on multiple servers when the main one gets overloaded).
My goal is to move as much of the writes away from MySQL and into Redis, but I'm having a hard time filtering MySQL data based on the counts/budgets stored in Redis, especially in places where a traditional JOIN would be used.
A simplified example of such MySQL query that would get the campaign with the highest bid within the user's budget:
SELECT campaigns.id, campaigns.url FROM campaigns
JOIN users ON campaigns.user_id = users.id
ORDER BY LEAST(users.credits, campaigns.bid) DESC
LIMIT 1;
After a click is delivered to that campaign, a budget reduction is immediately needed. Of course, reducing the credits in MySQL is trivial, but as soon as a user starts sending multiple clicks per second, the problems start appearing (mainly deadlocks in a cluster or reaching the maximum number of connections).
Applying a credit reduction in Redis would be preferred, but I have troubles connecting the dots between a bunch of credit records in Redis and filtering and sorting MySQL records based on that.
What would be a good approach to this problem that will allow me to touch MySQL as little as possible? Or maybe there is a fully different approach I need to take for this to happen.
Any advice or links will be much appreciated.
I would not recommend to move all write requests to Redis, especially for data with strong consistency(like payments).
Redis is a in-memory database, which do not have ACID transaction guarantee like MySQL. So you data still have some chances to be lost after write to Redis even if you have AOF enabled, which can make your data inconsistent.
For you case I thing you can integrate message queue(Kafka, rabbitMQ) to avoid connection issues and deadlocks:
When transaction occurred, serialize the request with data to write and send to message queue.
MySQL will listen on MQ with a fixed consume rate(based on your need), and write the data into MySQL sequentially(and rewrite to Redis if you need cache)
For client side, you can have a thread to query the result in an infinite loop until write finished. This will make the async write performs like sync.
In this case, you will avoid resouces compete(like deadlocks), and will also smooth the write rate by a fixed consuming rate.
For a project we are working with an several external partner. For the project we need access to their MySQL database. The problem is, they cant do that. Their databse is hosted in a managed environment where they don't have much configuration possibilities. And they dont want do give us access to all of their data. So the solution they came up with, is the federated storage engine.
We now have one table for each table of their database. The problem is, the amount of data we get is huge and will even increase in the future. That means there are a lot of inserts performed on our database. The optimal solution for us would be to intercept all incoming MySQL traffic, process it and then store it in bulk. We also thought about using someting like redis to store the data.
Additionnaly, we plan to get more data from different partners. They will potentialy provide us the data in different ways. So using redis would allow us, to have all our data in one place.
Copying the data to redis after its stored in the mysql database is not an option. We just cant handle that many inserts and we need the data as fast as possible.
TL;DR
is there a way to pretend to be a MySQL server so we can directly process data received via the federated storage engine?
We also thought about using the blackhole engine in combination with binary logging on our side. So incoming data would only be written to the binary log and wouldn't be stored in the database. But then performance would still be limited by Disk I/O.
I want to set up a teamspeak 3 server. I can choose between SQLite and MySQL as database. Well I usually tend to "do not use SQLite in production". But on the other hand, it's a teamspeak server. Well okay, just let me google this... I found this:
Speed
SQLite3 is much faster than MySQL database. It's because file database is always faster than unix socket. When I requested edit of channel it took about 0.5-1 sec on MySQL database (127.0.0.1) and almost instantly (0.1 sec) on SQLite 3. [...]
http://forum.teamspeak.com/showthread.php/77126-SQLite-vs-MySQL-Answer-is-here
I don't want to start a SQLite vs MySQL debate. I just want to ask: Is his argument even valid? I can't imagine it's true what he says. But unfortunately I'm not expert enough to answer this question myself.
Maybe TeamSpeak dev's have some major differences in their db architecture between SQLite and MySQL which explains a huge difference in speed (I can't imagine this).
At First Access Time will Appear Faster in SQLite
The access time for SQLite will appear faster at first instance, but this is with a small number of users online. SQLite uses a very simplistic access algorithm, its fast but does not handle concurrency.
As the database starts to grow, and the amount of simultaneous access it will start to suffer. The way servers handle multiple requests is completely different and way more complex and optimized for high concurrency. For example, SQLite will lock the whole table if an update is going on, and queue the orders.
RDBMS's Makes a lot of extra work that make them more Scalable
MySQL for example, even with a single user will create an access QUEUE, lock tables partially instead of allowing only single user-per time executions, and other pretty complex tasks in order to make sure the database is still accessible for any other simultaneous access.
This will make a single user connection slower, but pays off in the future, when 100's of users are online, and in this case, the simple
"LOCK THE WHOLE TABLE AND EXECUTE A SINGLE QUERY EACH TIME"
procedure of SQLite will hog the server.
SQLite is made for simplicity and Self Contained Database Applications.
If you are expecting to have 10 simultaneous access writing at the database at a time SQLite may perform well, but you won't want an 100 user application that constant writes and reads data to the database using SQLite. It wasn't designed for such scenario, and it will trash resources.
Considering your TeamSpeak scenario you are likely to be ok with SQLite, even for some business it is OK, some websites need databases that will be read only unless when adding new content.
For this kind of uses SQLite is a cheap, easy to implement, self contained, perfect solution that will get the job done.
The relevant difference is that SQLite uses a much simpler locking algorithm (a simple global database lock).
Using fine-grained locking (as MySQL and most other DB servers do) is much more complex, and slower if there is only a single database user, but required if you want to allow more concurrency.
I have not personally tested SQLite vs MySQL, but it is easy to find examples on the web that say the opposite (for instance). You do ask a question that is not quite so religious: is that argument valid?
First, the essence of the argument is somewhat specious. A Unix socket would be used to communicate to a database server. A "file database" seems to refer to the fact that communication is through a compiled-in interface. In the terminology of SQLite, it is server-less. Most databases store data in files, so the terminology "file database" is a little misleading.
Performance of a database involves multiple factors, such as:
Communication of query to the database.
Speed of compilation (ability to store pre-compiled queries is a plus here).
Speed of processing.
Ability to handle complex processing.
Compiler optimizations and execution engine algorithms.
Communication of results back to the application.
Having the interface be compiled-in affects the first and last of these. There is nothing that prevents a server-less database from excelling at the rest. However, database servers are typically millions of lines of code -- much larger than SQLite. A lot of this supports extra functionality. Some of it supports improved optimizations and better algorithms.
As with most performance questions, the answer is to test the systems yourself on your data in your environment. Being server-less is not an automatic performance gain. Having a server doesn't make a database "better". They are different applications designed for different optimization points.
In short:
For Local application databses, single user applications, and little simple projects keeping small data SQLite is winner.
For Network database applications, multiuser and concurrency, load balancing and growing data managements, security and roll based authentications, big projects and widely used services you should choose MySql.
In your question I do not know much about teamspeak servers and what kind of data it actually needs to keep in its database but if it just needs a local DBMS and not needs to proccess lots of concurrency and managements SQLite will be my choice.
I am new to server side programming and am trying to understand relational databases a little better. Whenever I read about MYSQL vs SQLite people always talk about SQLite not being able to have multiple users. However, when I program with the Django Framework I am able to create multiple users on the sqlitedb. Can someone explain what people mean by multi-user? Thanks!
When people talk about multiple users in this context, they are talking about simultaneous connections to the database. The users in this case are threads in the web server that are accessing the database.
Different databases have different solutions for handling multiple connections working with the database at once. Generally reading is not a problem, as multiple reading operations can overlap without disturbing each other, but only one connection can write data in a specific unit at a a time.
The difference between concurrency for databases is basically how large units they lock when someone is writing. MySQL has an advanced system where records, blocks or tables can be locked depending on the need, while SQLite has a simpler system where it only locks the entire database.
The impact of this difference is seen when you have multiple threads in the webserver, where some threads want to read data and others want to write data. MySQL can read from one table and write into another at the same time without problem. SQLite has to suspend all incoming read requests whenever someone wants to write something, wait for all current reads to finish, do the write, and then open up for reading operations again.
As you can read here, sqlite supports multi users, but lock the whole db.
Sqlite is used for development ussualy, buy Mysql is a better sql for production, because it has a better support for concurrency access and write, but sqlite dont.
Hope helps
SQLite concurrency is explained in detail here.
In a nutshell, SQLite doesn't have the fine-grained concurrency mechanisms that MySQL does. When someone tries to write to a MySQL database, the MySQL database will only lock what it needs to lock, usually a single record, sometimes a table.
When a user writes to a SQLite database, the entire database file is momentarily locked. As you might imagine, this limits SQLite's ability to handle many concurrent users.
Multi-user means that many tasks (possibly on many separate computers) can have open connections to the database at the same time.
A multi-user database provides things like locks to allow these tasks to update the database safely.
Look at ScimoreDB. It's an embedded database that supports multi-process (or user) read and write access. It also can work as a client-server database.
Suppose there is a messaging system. This system has millions of entry to be sent and get reported and the count is growing by 100K every hour. 2 service accesses db, one is sender, one is reporter. So what would you suggest in order to get maximum performance? How could the db be designed?
Also what open source RDBMS would you suggest among mysql, postgresql, mongodb etc. to fullfil this high volume db?
Thanks
You've not really provided much information on your requirement other than a few comments about expected data volumes. Simple storage of large volumes of data has no real intrinsic value, it's the ability to access that data which gives the real value; so knowing how you expected to retrieve information from the database is more important than how much data you want to store.
Do these messages really require a document db like MongDB, or are are they structured enough to use a straight RDBMS like Postgresql or MySQL. Do you need full text search capability? How often and what type of queries are executed against this message data? Are you trying to write your own Twitter?
If those are your current data volumes, look to using db replication for resilience. Consider partitioning your message table, perhaps by date posted. Use master/slave (or even multi-master/multi-slave) as Konerak has suggested. Look at the possibilities of an archive table for older messages that are less likely to be queried, but which are then still available. Look at what a commercial database like Oracle can offer you. Get in a professional to help tune the db for performance, rather than simply asking for free advice on sites like SO.
Consider your hardware as well... multiple load balanced servers to help with the volumes (we have 14 dedicated servers purely for accepting new messages, and three high performance servers tuned for querying the data).