Firstly, I will try to create a picture of what I am trying to do:-
Taken up hosting in a shared server (godaddy).
Publishing a few APIs which will be accessed by a number of clients to get/push data from/to the server
The server's database has only 1 table which contains the data. Data isn't that huge - about 10 columns and would probably not exceed 10,000 rows in the foreseeable future. Basically, there are going to be 10-15 record inserts in a day.
Information from the clients will result in inserts and update on existing rows. Therefore, clients will call APIs to read as well as update things in the database
Is having 1 table a very bad idea, assuming there can be over 1000 client requests accessing the server to read and update information every day.
If it is a bad idea and I were to create mirror tables while trying to distribute client requests, then they would need triggers to ensure that the information in each table remains the same. What I am not able to understand is what happens if there is a lock on table, which on update triggers an insert or update on another table, whose lock is preoccupied with another client.
It will be bad to see the connections to time out solely due to the wait time for locks to be released (in case the number of client requests to the database are just on the higher side at some point in time).
Related
I am creating a custom analytics system and currently in the database designing process. I'm planning to use MariaDB with the InnoDB engine to be able to handle big loads.
The data I'm expecting could be around 500k clicks/day. I will need to insert these rows into the database, which means that I'll have around 5.8 inserts/sec on average. However, at the same time, I want to record if someone visited a page associated with that click. (basically to record funnels)
So what I'm planning to do is to create additional columns and search for the ID of the specific row then update that column with the exact time of the visit.
My first question: is this generally a recommended approach to design the database like that? If not, how else is it worth to design the database?
My only concern is that while updating rows the Table will be locked, and can't do inserts, therefore slowing down the user experience.
My second question: is this something I should worry about, that the table gets locked while updating, and thus slowing down inserts? Does it hurt performance?
InnoDB doesn't lock the table for insert if you're performing the update. Your users won't experience any weird hanging.
It's an MVCC compliant engine, designed to handle concurrent access to underlying tables.
You can control the engine's behavior by choosing an appropriate isolation level, however the default (REPEATABLE READ) is excellent and does the job more than well.
If a table is being modified by multiple users (not users that connect to your site but connections established towards MySQL via a scripting language or some other service) and there's many inserts/updates/deletes - MySQL can throw an error saying a deadlock occurred.
A deadlock is a warning, not an error, that more than 1 thread tried to access an occupied resource (such as two threads tried to update the same row at the same time, but only 1 will be allowed to do so). It's an indication you should repeat the query.
I'm suggesting that you take care of all possible scenarios in the language of your choice when it comes to handling MySQL that's under heavier I/O.
~6 inserts a second isn't a lot, make sure you're allowing MySQL to access sufficient system resources. For InnoDB, check the value of innodb_buffer_pool_size or google a bit to see what it is and how to use it to make your database run fast.
Good luck!
At a mere 5.6/second, there won't be much problem.
I do, however, suggest vertical partitioning for "Likes", "Upvotes", "Clicks", and similar things. These tend to have a lot of UPDATEs of random single rows, and may interfere with other activity.
That is, have a separate table with (perhaps) just 2 columns:
The id of the item being Liked/Clicked/etc.
A counter.
It is simple enough (and fast enough) to JOIN via that id when you want to display info including the counter.
As already pointed out, the row is locked, not the table.
I am pulling a large amount of data (700k lines) from one database (Source Database), performing some data manipulation, then inserting/updating another database (Lookup Database) with the altered data.
The lookup database is being accessed by users regularly throughout the day. They are only performing select statements. The updates to the lookup database occur hourly. As the size of the data in the source database increases, I'm noticing much higher numbers of "lock wait timeout exceeded" errors when updating the lookup database. Is my assumption correct that this is most likely resulting from the select statements and update statements accessing the same data? That would make sense as the larger number of updates occurring would more frequently hit data that is being accessed by the users.
In my attempts to fix the situation I've increased the lock wait timeout to 120 (up from 60) but that has had very little effect.
One thought I had was to update into a new table in the lookup database, then swap (in the software the users are using) the database their queries go to. Any other ideas how I might resolve the issue?
I'm not sure if it's relevant but I'm using MySQL and the two databases are on completely separate servers.
Thanks!
I have a MySql database hosted on a webserver which has a set of tables with data in it. I am distributing my front end application which is build using HTML5 / Javascript /CS3.
Now when multiple users tries to make an insert/update into one of the tables at the same time is it going to create a conflict or will it handle the locking of the table for me automatically example when one user is using, it will lock the table for him and then let the rest follow in a queue once the user finishes it will release the lock and then give it to the next in the queue ? Is this going to happen or do i need to handle the case in mysql database
EXAMPLE:
When a user wants to make an insert into the database he calls a php file located on a webserver which has an insert command to post data into the database. I am concerned if two or more people make an insert at the same time will it make the update.
mysqli_query($con,"INSERT INTO cfv_postbusupdate (BusNumber, Direction, StopNames, Status, comments, username, dayofweek, time) VALUES (".trim($busnum).", '".trim($direction3)."', '".trim($stopname3)."', '".$status."', '".$comments."', '".$username."', '".trim($dayofweek3)."', '".trim($btime3)."' )");
MySQL handles table locking automatically.
Note that with MyISAM engine, the entire table gets locked, and statements will block ("queue up") waiting for a lock to be released.
The InnoDB engine provides more concurrency, and can do row level locking, rather than locking the entire table.
There may be some cases where you want to take locks on multiple MyISAM tables, if you want to maintain referential integrity, for example, and you want to disallow other sessions from making changes to any of the tables while your session does its work. But, this really kills concurrency; this should be more of an "admin" type function, not really something a concurrent application should be doing.
If you are making use of transactions (InnoDB), the issue your application needs to deal with is the sequence in which rows in which tables are locked; it's possible for an application to experience "deadlock" exceptions, when MySQL detects that there are two (or more) transactions that can't proceed because each needs to obtain locks held by the other. The only thing MySQL can do is detect that, and the only recovery MySQL can do for this is to choose one of the transactions to be the victim, that's the transaction that will get the "deadlock" exception, because MySQL killed it, to allow at least one of the transactions to proceed.
I am trying to create a web application, primary objective is to insert request data into database.
Here is my problem, One request itself contains 10,000 to 1,00,000 data sets of information
(Each data set needs to be inserted separately as a row in the database)
I may get multiple request on this application concurrently, so its necessary for me to make the inserts fast.
I am using MySQL database, Which approach is better for me, LOAD DATA or BATCH INSERT or is there a better way than these two?
How will your application retrieve this information?
- There will be another background thread based java application that will select records from this table process them one by one and delete them.
Can you queue your requests (batches) so your system will handle them one batch at a time?
- For now we are thinking of inserting it to database straightaway, but yes if this approach is not feasible enough we may think of queuing the data.
Do retrievals of information need to be concurrent with insertion of new data?
- Yes, we are keeping it concurrent.
Here are certain answers to your questions, Ollie Jones
Thankyou!
Ken White's comment mentioned a couple of useful SO questions and answers for handling bulk insertion. For the record volume you are handling, you will enjoy the best success by using MyISAM tables and LOAD DATA INFILE data loading, from source files in the same file system that's used by your MySQL server.
What you're doing here is a kind of queuing operation. You receive these batches (you call them "requests") of records (you call them "data sets.) You put them into a big bucket (your MySQL table). Then you take them out of the bucket one at a time.
You haven't described your problem completely, so it's possible my advice is wrong.
Is each record ("data set") independent of all the others?
Does the order in which the records are processed matter? Or would you obtain the same results if you processed them in a random order? In other words, do you have to maintain an order on the individual records?
What happens if you receive two million-row batches ("requests") at approximately the same time? Assuming you can load ten thousand records a second (that's fast!) into your MySQL table, this means it will take 200 seconds to load both batches completely. Will you try to load one batch completely before beginning to load the second?
Is it OK to start processing and deleting the rows in these batches before the batches are completely loaded?
Is it OK for a record to sit in your system for 200 or more seconds before it is processed? How long can a record sit? (this is called "latency").
Given the volume of data you're mentioning here, if you're going into production with living data you may want to consider using a queuing system like ActiveMQ rather than a DBMS.
It may also make sense simply to build a multi-threaded Java app to load your batches of records, deposit them into a Queue object in RAM (a ConcurrentLinkedQueue instance may be suitable) and process them one by one. This approach will give you much more control over the performance of your system than you will have by using a MySQL table as a queue.
Consider the following situation: You want to update the number of page views of each profile in your system. This action is very frequent, as almost all visits to your website result in a page view incremental.
The basic way is update Users set page_views=page_views+1. But this is totally not optimal because we don't really need instant update (1 hour late is ok). Is there any other way in MySQL to postpone a sequence of updates, and make cumulative updates at a later time?
I myself tried another method: storing a counter (# of increments) for each profile. But this also results in handling a few thousands of small files, and I think that the disk IO cost (even if a deep tree-structure for files is applied) would probably exceed the database.
What is your suggestion for this problem (other than MySQL)?
To improve performance you could store your page view data in a MEMORY table - this is super fast but temporary, the table only persists while the server is running - on restart it will be empty...
You could then create an EVENT to update a table that will persist the data on a timed basis. This would help improve performance a little with the risk that, should the server go down, only the number of visits since the last run of the event would be lost.
The link posted by James via the comment to your question, wherein lies an accepted answer with another comment about memcached was my first thought also. Just store the profileIds in memcached then you could set up a cron to run every 15 minutes and grab all the entries then issue the updates to MySQL in a batch, but there are a few things to consider.
When you run the batch script to grab the ids out of memcached, you will have to ensure you remove all entries which have been parsed, otherwise you run the risk of counting the same profile views multiple times.
Being that memcache doesn't support wildcard searching via keys, and that you will have to purge existing keys for the reason stated in #1, you will probably have to setup a separate memcache server pool dedicated for the sole purpose of tracking profile ids, so you don't end up purging cached values which have no relation to profile view tracking. However, you could avoid this by storing the profileId and a timestamp within the value payload, then have your batch script step through each entry and check the timestamp, if it's within the time range you specified, add it to queue to be updated, and once you hit the upper limit of your time range, the script stops.
Another option may be to parse your access logs. If user profiles are in a known location like /myapp/profile/1234, you could parse for this pattern and add profile views this way. I ended up having to go this route for advertiser tracking, as it ended up being the only repeatable way to generate billing numbers. If they had any billing disputes we would offer to send them the access logs and parse for themselves.