Alternatives to LINQ To SQL on high loaded pages - linq-to-sql

To begin with, I LOVE LINQ TO SQL. It's so much easier to use than direct querying.
But, there's one great problem: it doesn't work well on high loaded requests. I have some actions in my ASP.NET MVC project, that are called hundreds times every minute.
I used to have LINQ to SQL there, but since the amount of requests is gigantic, LINQ TO SQL almost always returned "Row not found or changed" or "X of X updates failed". And it's understandable. For instance, I have to increase some value by one with every request.
var stat = DB.Stats.First();
stat.Visits++;
// ....
DB.SubmitChanges();
But while ASP.NET was working on those //... instructions, the stats.Visits value stored in the table got changed.
I found a solution, I created a stored procedure
UPDATE Stats SET Visits=Visits+1
It works well.
Unfortunately now I'm getting more and more moments like that. And it sucks to create stored procedures for all cases.
So my question is, how to solve this problem? Are there any alternatives that can work here?
I hear that Stackoverflow works with LINQ to SQL. And it's more loaded than my site.

This isn't exactly a problem with Linq to SQL, per se, it's an expected result with optimistic concurrency, which Linq to SQL uses by default.
Optimistic concurrency means that when you update a record, you check the current version in the database against the copy that was originally retrieved before making any offline updates; if they don't match, report a concurrency violation ("row not found or changed").
There's a more detailed explanation of this here. There's also a fairly sizable guide on handling concurrency errors. Typically the solution involves simply catching ChangeConflictException and picking a resolution, such as:
try
{
// Make changes
db.SubmitChanges();
}
catch (ChangeConflictException)
{
foreach (var conflict in db.ChangeConflicts)
{
conflict.Resolve(RefreshMode.KeepCurrentValues);
}
}
The above version will overwrite whatever is in the database with the current values, regardless of what other changes were made. For other possibilities, see the RefreshMode enumeration.
Your other option is to disable optimistic concurrency entirely for fields that you expect might be updated. You do this by setting the UpdateCheck option to UpdateCheck.Never. This has to be done at the field level; you can't do it at the entity level or globally at the context level.
Maybe I should also mention that you haven't picked a very good design for the specific problem you're trying to solve. Incrementing a "counter" by repeatedly updating a single column of a single row is not a very good/appropriate use of a relational database. What you should be doing is actually maintaining a history table - such as Visits - and if you really need to denormalize the count, implement that with a trigger in the database itself. Trying to implement a site counter at the application level without any data to back it up is just asking for trouble.
Use your application to put actual data in your database, and let the database handle aggregates - that's one of the things databases are good at.

Use a producer/consumer or message queue model for updates that don't absolutely have to happen immediately, particularly status updates. Instead of trying to update the database immediately keep a queue of updates that the asp.net threads can push to and then have a writer process/thread that writes the queue to the database. Since only one thread is writing, there will be much less contention on the relevant tables/roles.
For reads, use caching. For high volume sites even caching data for a few seconds can make a difference.

Firstly, you could call DB.SubmitChanges() right after stats.Visits++, and that would greatly reduce the problem.
However, that still is not going to save you from the concurrency violation (that is, simultaneously modifying a piece of data by two concurrent processes). To fight that, you may use the standard mechanism of transactions. With LINQ-to-SQL, you use transactions by instantiating a TransactionScope class, thusly:
using( TransactionScope t = new TransactionScope() )
{
var stats = DB.Stats.First();
stats.Visits++;
DB.SubmitChanges();
}
Update: as Aaronaught correctly pointed out, TransactionScope is not going to help here, actually. Sorry. But read on.
Be careful, though, not to make the body of a transaction too long, as it will block other concurrent processes, and thus, significantly reduce your overall performance.
And that brings me to the next point: your very design is probably flawed.
The core principle in dealing with highly shared data is to design your application in such way that the operations on that data are quick, simple, and semantically clear, and they must be performed one after another, not simultaneously.
The one operation that you're describing - counting visits - is pretty clear and simple, so it should be no problem, once you add the transaction. I must add, however, that while this will be clear, type-safe and otherwise "good", the solution with stored procedure is actually a much preferred one. This is actually exactly the way database applications were being designed in ye olden days. Think about it: why would you need to fetch the counter all the way from the database to your application (potentially over the network!) if there is no business logic involved in processing it. The database server may increment it just as well, without even sending anything back to the application.
Now, as for other operations, that are hidden behind // ..., it seems (by your description) that they're somewhat heavy and long. I can't tell for sure, because I don't see what's there, but if that's the case, you probably want to separate them into smaller and quicker ones, or otherwise rethink your design. I really can't tell anything else with this little information.

Related

Spring Data JPA - Best Way to Update Concurrently Accessed "Total" Field

(Using Spring Boot 2.3.3 w/ MySQL 8.0.)
Let's say I have an Account entity that contains a total field, and one of those account entities represents some kind of master account. I.e. that master account has its total field updated by almost every transaction, and it's important that any updates to that total field are done on the most recent value.
Which is the better choice within such a transaction:
Using a PESSIMISTIC_WRITE lock, fetch the master account, increment the total field, and commit the transaction. Or,
Have a dedicated query that essentially does something like, UPDATE Account SET total = total + x as part of the transaction? I'm assuming I'd still need the same pessimistic lock in this case for the UPDATE query, e.g. via #Query and #Lock.
Also, is it an anti-pattern to retry a failed transaction a set number of times due to a lock-acquisition timeout (or other lock-based exception)? Or is it better to let it fail, report it to the client, and let the client try to call the transaction/service again?
Apologies for the basic question, but, it's been some time since I've had to worry about doing such a thing in Spring.
Thanks in advance!
After exercising my Google Fu a bit more and digging even deeper, it seems variations of this question have already been asked, at least insofar as the 'locking' portion goes.
That is, while the Spring Data JPA docs mention redeclaring repository methods and adding the #Lock annotation, it seems that it is meant strictly for queries that read only. This is what I'd originally thought as it wouldn't make much sense to "lock" an UPDATE query unless there was some additional magic happening with the JPQL query.
As for retrying, retrying does seem to be the way to go, but of course using a number of retries that makes sense for the situation.
Hopefully this helps someone else in the future who has a brain cramp like I did.

What, exactly, does allowMultiQueries do?

Adding allowMultiQueries=true to the JDBC string makes MySQL accept Statements with multiple queries.
But what exactly does this do? Is there any benefit to this?
Perhaps it reduces the delay due to round trips? Something like
LOCK
UPDATE ...
UNLOCK
which if done in one statements holds the lock for less time.
When, if ever, would I want to combine queries in a single Statement, rather than in separate ones?
For running safe scripts of your own creation that otherwise would need to be run line by line. For instance, a script from mysqldump, or one that you would have run anyway, safely and trusted. This was pointed out to me once by someone when I asked "why would you want to do that?" He responded, his stockpile of scripts, of his own, each of which has no user input for tomfoolery and the potential of sql injection. The size of these routines is limited by max_allowed_packet and the strategy would be, of course, reading the file into your buffer, and using that for the query in a Multi.
For running a few statements in concert where one relies on the other in the transient nature of a call. Transient meaning that had you issued a subsequent call not via a Multi, that the necessary information is no longer available for a piece of it. A common example often given, wise or not, is the duo of SQL_CALC_FOUND_ROWS and FOUND_ROWS() which popularly was debunked in the Percona article To SQL_CALC_FOUND_ROWS or not to SQL_CALC_FOUND_ROWS?. There is an argument to be made in that situation that a single call that not only returns the resultset but has available the count to be grabbed shortly thereafter is a wiser route for more accurate pagination routines. This assumes that a separate call for count(*) and another for the data could generate a discrepancy in multi-user concurrent systems like all of ours most likely. So, the just mentioned verbiage addresses accuracy, not performance which is what the Percona article is about. Another use-case is priming and using User-Defined Variables into queries. Many of these can be folded into the query and initialized with a cross join, however.
When, if ever, would I want to combine queries in a single Statement, rather than in separate ones?
There are two great use cases for this feature:
If you are lazy and like to blindly run queries without checking for success or row counts or auto_increment value assignment, or
If you like the idea of increasing the odds of SQL injection vulnerabilities username ='' AND 0 = 1; ← right here. With this mode inactive, anything after the injected semicolon is an error, as it should be. With this mode active, a whole world of "oops" can open right up.
What I am saying is... You're right. Don't use it.
Yes, it reduces the impact of round-trip time to the database, pipelining queries... which can be significant with a distant database... but at the cost of increased risk that isn't worth it.

What is the best way (in Rails/AR) to ensure writes to a database table are performed synchronously, one after another, one at a time?

I have noticed that using something like delayed_job without a UNIQUE constraint on a table column would still create double entries in the DB. I have assumed delayed_job would run jobs one after another. The Rails app runs on Apache with Passenger Phusion. I am not sure if that is the reason why this would happen, but I would like to make sure that every item in the queue is persisted to AR/DB one after another, in sequence, and to never have more than one write to this DB table happen at the same time. Is this possible? What would be some of the issues that I would have to deal with?
update
The race conditions arise because an AJAX API is used to send data to the application. The application received a bunch of data, each batch of data is identified as belonging together by a Session ID (SID), in the end, the final state of the database has to include the latest most up-to date AJAX PUT query to the API. Sometimes queries arrive at the exact same time for the same SID -- so I need a way to make sure they don't all try to be persisted at the same time, but one after the other, or simply the last to be sent by AJAX request to the API.
I hope that makes my particular use-case easier to understand...
You can lock a specific table (or tables) with the LOCK TABLES statement.
In general I would say that relying on this is poor design and will likely lead to with scalability problems down the road since you're creating an bottleneck in your application flow.
With your further explanations, I'd be tempted to add some extra columns to the table used by delayed_job, with a unique index on them. If (for example) you only ever wanted 1 job per user you'd add a user_id column and then do
something.delay(:user_id => user_id).some_method
You might need more attributes if the pattern is more sophisticated, e.g. there are lots of different types of jobs and you only wanted one per person, per type, but the principle is the same. You'd also want to be sure to rescue ActiveRecord::RecordNotUnique and deal with it gracefully.
For non delayed_job stuff, optimistic locking is often a good compromise between handling the concurrent cases well without slowing down the non concurrent cases.
If you are worried/troubled about/with multiple processes writing to the 'same' rows - as in more users updating the same order_header row - I'd suggest you set some marker bound to the current_user.id on the row once /order_headers/:id/edit was called, and removing it again, once the current_user releases the row either by updating or canceling the edit.
Your use-case (from your description) seems a bit different to me, so I'd suggest you leave it to the DB (in case of a fairly recent - as in post 5.1 - MySQL, you'd add a trigger/function which would do the actual update, and here - you could implement similar logic to the above suggested; some marker bound to the sequenced job id of sorts)

Speeding up Hibernate Object creation?

We use Hibernate as our ORM layer on top of a MySQL database. We have quite a few model objects, of which some are quite large (in terms of number of fields etc.). Some of our queries requires that a lot (if not all) of the model objects are retrieved from the database, to do various calculations on them.
We have lazy loading enabled, but in some cases it still takes a significant amount of time for Hibernate to populate the objects. The execution time of the MySQL query is very fast (in the order of a few milliseconds), but then Hibernate takes its sweet time to populate the objects.
Is there any way / pattern / optimization to speed up this process?
Thanks.
One approach is to not populate the entity but some kind of view object.
Assuming a CustomerView has the appropriate constructor, you can do
select new CustomerView(c.firstname, c.lastname, c.age) from Customer c
Though I'm a bit surprised about Hibernate being slow to populate objects unless you happen to load associated objects by cascade and forget a few appropriate fetches.
Perhaps consider adding a second level cache? This won't necessarily speed up the object instantiation, but it could considerably cut down the frequency in which you are needing to do that.
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html
Since you're asking a performance-related question, you might want to collect more data on where the bottleneck is. You say
Hibernate takes its sweet time to populate the objects.
How do you know it's Hibernate that's the problem? In other words, is Hibernate itself the problem, or could there not be enough memory (or too much) so the JVM isn't running efficiently?
Also, you mention
We have quite a few model objects, of which some are quite large (in terms of number of fields etc.).
How many is "quite large"? Dozens? Hundreds? Thousands? It makes a big difference, because relational databases (such as MySQL) start performing more poorly as your table gets "wider" (see this question: Is there a performance decrease if there are too many columns in a table?).
Performance is a lot about balancing constraints, but it's also about collecting a lot of data to see where the problem is and then fixing that problem. Then you'll find the next bottleneck and fix that one until your performance is good enough, or you run out of implementation time.

Never delete entries? Good idea? Usual?

I am designing a system and I don't think it's a good idea to give the ability to the end user to delete entries in the database. I think that way because often then end user, once given admin rights, might end up making a mess in the database and then turn to me to fix it.
Of course, they will need to be able to do remove entries or at least think that they did if they are set as admin.
So, I was thinking that all the entries in the database should have an "active" field. If they try to remove an entry, it will just set the flag to "false" or something similar. Then there will be some kind of super admin that would be my company's team who could change this field.
I already saw that in another company I worked for, but I was wondering if it was a good idea. I could just make regular database backups and then roll back if they commit an error and adding this field would add some complexity to all the queries.
What do you think? Should I do it that way? Do you use this kind of trick in your applications?
In one of our databases, we distinguished between transactional and dictionary records.
In a couple of words, transactional records are things that you cannot roll back in real life, like a call from a customer. You can change the caller's name, status etc., but you cannot dismiss the call itself.
Dictionary records are things that you can change, like assigning a city to a customer.
Transactional records and things that lead to them were never deleted, while dictionary ones could be deleted all right.
By "things that lead to them" I mean that as soon as the record appears in the business rules which can lead to a transactional record, this record also becomes transactional.
Like, a city can be deleted from the database. But when a rule appeared that said "send an SMS to all customers in Moscow", the cities became transactional records as well, or we would not be able to answer the question "why did this SMS get sent".
A rule of thumb for distinguishing was this: is it only my company's business?
If one of my employees made a decision based on data from the database (like, he made a report based on which some management decision was made, and then the data report was based on disappeared), it was considered OK to delete these data.
But if the decision affected some immediate actions with customers (like calling, messing with the customer's balance etc.), everything that lead to these decisions was kept forever.
It may vary from one business model to another: sometimes, it may be required to record even internal data, sometimes it's OK to delete data that affects outside world.
But for our business model, the rule from above worked fine.
A couple reasons people do things like this is for auditing and automated rollback. If a row is completely deleted then there's no way to automatically rollback that deletion if it was in error. Also, keeping a row around and its previous state is important for auditing - a super user should be able to see who deleted what and when as well as who changed what, etc.
Of course, that's all dependent on your current application's business logic. Some applications have no need for auditing and it may be proper to fully delete a row.
The downside to just setting a flag such as IsActive or DeletedDate is that all of your queries must take that flag into account when pulling data. This makes it more likely that another programmer will accidentally forget this flag when writing reports...
A slightly better alternative is to archive that record into a different database. This way it's been physically moved to a location that is not normally searched. You might add a couple fields to capture who deleted it and when; but the point is it won't be polluting your main database.
Further, you could provide an undo feature to bring it back fairly quickly; and do a permanent delete after 30 days or something like that.
UPDATE concerning views:
With views, the data still participates in your indexing scheme. If the amount of potentially deleted data is small, views may be just fine as they are simpler from a coding perspective.
I prefer the method that you are describing. Its nice to be able to undo a mistake. More often than not, there is no easy way of going back on a DELETE query. I've never had a problem with this method and unless you are filling your database with 'deleted' entries, there shouldn't be an issue.
I use a combination of techniques to work around this issue. For some things adding the extra "active" field makes sense. Then the user has the impression that an item was deleted because it no longer shows up on the application screen. The scenarios where I would implement this would include items that are required to keep a history...lets say invoice and payment. I wouldn't want such things being deleted for any reason.
However, there are some items in the database that are not so sensitive, lets say a list of categories that I want to be dynamic...I may then have users with admin privileges be allowed to add and delete a category and the delete could be permanent. However, as part of the application logic I will check if the category is used anywhere before allowing the delete.
I suggest having a second database like DB_Archives whre you add every row deleted from DB. The is_active field negates the very purpose of foreign key constraints, and YOU have to make sure that this row is not marked as deleted when it's referenced elsewhere. This becomes overly complicated when your DB structure is massive.
There is an acceptable practice that exists in many applications (drupal's versioning system, et. al.). Since MySQL scales very quickly and easily, you should be okay.
I've been working on a project lately where all the data was kept in the DB as well. The status of each individual row was kept in an integer field (data could be active, deleted, in_need_for_manual_correction, historic).
You should consider using views to access only the active/historic/... data in each table. That way your queries won't get more complicated.
Another thing that made things easy was the use of UPDATE/INSERT/DELETE triggers that handled all the flag changing inside the DB and thus kept the complex stuff out of the application (for the most part).
I should mention that the DB was a MSSQL 2005 server, but i guess the same approach should work with mysql, too.
Yes and no.
It will complicate your application much more than you expect since every table that does not allow deletion will be behind extra check (IsDeleted=false) etc. It does not sound much but then when you build larger application and in query of 11 tables 9 require chech of non-deletion.. it's tedious and error prone. (Well yeah, then there are deleted/nondeleted views.. when you remember to do/use them)
Some schema upgrades will become PITA since you'll have to relax FK:s and invent "suitable" data for very, very old data.
I've not tried, but have thought a moderate amount about solution where you'd zip the row data to xml and store that in some "Historical" table. Then in case of "must have that restored now OMG the world is dying!1eleven" it's possible to dig out.
I agree with all respondents that if you can afford to keep old data around forever it's a good idea; for performance and simplicity, I agree with the suggestion of moving "logically deleted" records to "old stuff" tables rather than adding "is_deleted" flags (moving to a totally different database seems a bit like overkill, but you can easily change to that more drastic approach later if eventually the amount of accumulated data turns out to be a problem for a single db with normal and "old stuff" tables).