Springboot + JPA: catch DataIntegrityViolationException vs check COUNT(field)

Springboot + JPA: catch DataIntegrityViolationException vs check COUNT(field) - mysql

Good morning,
I am creating a new user, and it's email must be unique. I have declared it as so in the User entity.
I am wondering whether it is better (quicker/best practice) to catch a DataIntegrityViolationException when the user is being created in the DB or if it is better to check if the user exists: select count(*) from User u where u.email=? for example.
I am working in SpringBoot, using MySQL and JPA.
Thank you so much!!

The difference of performance might be very insignificant in that case.
What is the most important here, is to understand what happens by reading the code. You should have some code in your service layer that checks every rule defined is not violated before actually trying to proceeed in databaase. IT will be easier for new comers (and even for you in few weeks) to have a clear view of what is tested and how.
That part of code should then raise an exception with a dedicated code (for example, let's take the ID of the business rule defined by your business analysts) and in a properties, you should have a message corresponding to that rule (the message key could be the id of the rule for example).
Also, while inserting a row in this table, you could get DataIntegrityViolationException for different reasons. So, that is not a durable solution anyway.

Related

Sailsjs error handling with database / waterline errors

I'm building an api that receives webhook data (orders from shopify) and saves them to a relational mysql database. Each order has products in it that belong to collections (categories). I relate the product to the collection, or create a new collection if it doesn't already exist. The problem I run into is that since I can't use findOrCreate() I do a find() first, and then if nothing is found I do a create. However, since nodejs is async, if I get many orders coming in at once, sometimes the same collection isn't found for two different products at the same time, and then one will create it right before the other one tries, so when the other one tries to create it waterline throws an error:
col_id
• A record with that col_id already exists (405145674).
What is the best way to catch this error so I can just do another find() so the function returns the proper collection instead of an undefined? I could just do this:
if (err){ findmethodagain(); }
But then it would try that for any error, not a "record with that id already exists" error.

If you're not seeing this a lot, it wouldn't hurt to just do a second attempt after a delay. You can use Promise.delay from, for example, Bluebird to run a second function after some number of milliseconds on any error. If in most situations one of them creates the collection, then you'll succeed with just one retry.
If you are seeing lots of other errors, well, fix them.
However, race conditions usually indicate that you need to rework the architecture. I'd suggest instead of relying on error attributes or trying to hack around them, you redesign it, at least for the situation where you have to create a new collection. One popular option is to set up a worker queue. In your case, you would just use find in the first pass to see if a collection exists, and if not, kick it off to the queue for a single worker to pick up sequentially. In the worker job, do your find then create. You can use Redis and Bull to set this up quickly and easily.

Should databases be simple and repetitive?

Basically, I'm creating a rails app with users and posts. I want to be able to soft delete them. To do this, all I need to do is create a boolean column deleted on the users and then use a conditional to change what information is displayed to a non admin user:
(rails)
def administrated_content
if !self.deleted && !current_user.is_admin?
self.content
else
"This post has been removed"
end
end
Now my question is, is it best to keep databases simple and repetitive? Because a few days ago I would have said it would be better to create a third table, a state table and set up a has_one belongs_to relationship between the user and a state, and a post and a state. Why? Because state is an attribute shared by both users and posts.
However, then I realised that this would result in more queries being executed.
so is it best to keep it simple and repeat yourself with attributes?

Yes, in general we keep each attribute in the table which it applies to, instead of needlessly adding a state table. It's okay for another table to have a similar state attribute.
That's far better than polymorphic-associations, which break the fundamental definition of a relation. And as you found, require you to write more complex queries.

It depends on the use-case you want to optimise on. If it is speed you want to achieve than a little bit of denormalization should be ok (again, depends on the scenario).
What your presented here makes sense in my opinion to be both in the user and in the post table because they will not lead to duplicated data. And also each one makes sense for both users and posts.
Think of the state as userState and postState. This way each makes sense in its own context. Maybe in the future the user gets another state other than deleted (Ex: 'in process of deletion') and that would not be true for posts.

which solution is good practice hibernate save data in database？

I have two entities which are Student and Class entities.
Student and Class are many to one relationship. So student contains class attribute.
Now i want to save or create a student associate with existing class(means i know primary key ID already).
Solution 1:
Student student = new Student();
Class class = session.load(classId);
student.setClass(class);
session.save(student);
Solution 2:
Student student = new Student();
Class class = new Class();
class.setClassId(classId);
student.setClass(class);
session.save(student);
My question here is in solution 1 it will issue two SQL, one is to get Class another is to insert student. But in solution 2 only need to have one SQL. If I have more class attribute,
i will load and issue more select sql before insert. It seems not that efficient. Is there any side-effect in solution 2?
which way to do save/insert is better? By the way, i do not set up cascade.
Thank you
Yee Chen

Solution 1 won't issue an SQL query to load Class. Unlike get(), load() returns a proxy object with the specified identifier and doesn't perform a database query immediately. Thus, load() method is a natural choice for this scenario (when you actually need to load an object, use get()).
Possible side effect of solution 2 depends on cascading configuration of relationship and so on. Even if it works fine in your current case, it makes your code more fragile, since seemingly unrelated changes in the code may break it.
So, I recommend you to use Solution 1 and don't worry about performance.

Is there any side-effect in solution 2?
First of all, you haven't associated your student with any class there.
Secondly, where do you get that class id from in the general case? At some earlier point in time, you had to either fetch an existing class instance from the DB, or create a new instance and persist it, so that you get its id. Of course, reusing an entity you already have is fine, but juggling with ids like you do above is IMHO not.
Thirdly, it is not a good idea to prematurely optimize your app. Get it to work properly first, then measure performance, and optimize only if and where needed.

web inserts at the same time

We have developed an online quiz where users can register a team to take part in the quiz.
There is a check in the asp to see if that teamname has already been submitted, and if it has, an error is generated.
We have noticed a problem if 2 teams are registered at exactly the same time with the same name, that both teams are registered. Although this is highly unlikely, we wondered what approach should be used to overcome this.
We are using mySql as the database if that makes any difference.
Thanks for any guidance.

Don't worry this never happens.
Databases are smart enough and handle concurrency isuuses.
If you run a query on database for registering a team and another team register at the same time, at database level the first query (when it's send to database) succeed and the second fails with an error which you should take care of. If registeration needs actions more than a simple insert on a table then you should use transaction objects at your queries/store-procedures.

You can set the name column to be unique, and the database will throw an error on the second insert. If you want to do it in code, it will be more complicated.

Not sure how two teams can register at exactly the same time - even if the requests are submitted simultaneously (down to the nanosecond), your transaction semantics should still guarantee that one of them is "first" from the database point of view.

IntegrityError with Django m2m relations

I have a relatively simple Django app, with quite heavy usage that is responsible for quite some concurrency in the db operations.
I have a model Post with a m2m to a Tag model.
A single line in my code, p.add(t) is repeatedly causing mysql exceptions (where p is a Post instance and t is a Tag instance.)
IntegrityError: (1062, "Duplicate entry '329051-1827414' for key 'post_id'")
When this is raised I can manually run this p.add(t) successfully, so it must have to do with some peculiar state that the db/app are in at the time of normal execution. It happens about once every 1000 tag-adding attempts, without any pattern that I can detect (i.e both numbers in the "329051-1827414" pair of the example change)
A CHECK TABLE in mysql on the relevant table shows that they are all seemingly OK.
Any ideas?

Usually you see errors like that when trying to add to an intermediate table if the row being added duplicates the unique-together constraint for the FK's. I'm guessing that in the example you provided "329051" is a Post id and "1827414" is a Tag id.
Normally in Django you can call the add() method repeatedly to add the same instance and Django takes care of everything for you. I'm assuming the model manager maintains some state to help it determine if each add() represents a new or existing row and if the row appears to be new it attempts an insert.
That in itself doesn't explain why you're getting the error. You mention "is responsible for quite some concurrency in the db operations.". Without knowing what that means, I'm guessing that you could be getting a race condition where multiple thread/processes are attempting to add the same new tag around the same time and both are attempting inserts.

I think I'm seeing a similar problem in my app - If I send two identical requests to add a m2m relation (e.g. tag in my case as well), I get that error because the m2m table has a unique constraint on (user, tag). I'm guessing the server is processing the .add functions at the same time.
if not already in database:
# Both invocations reach here because the next line takes some time to process.
create m2m row
I don't know how that can be remedied.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Springboot + JPA: catch DataIntegrityViolationException vs check COUNT(field) - mysql

Related

Sailsjs error handling with database / waterline errors

Should databases be simple and repetitive?

which solution is good practice hibernate save data in database？

web inserts at the same time

IntegrityError with Django m2m relations

Categories

Resources