Concurrent writes to DB with conditional unique - mysql

I am using spring-boot, mysql and JDBC in my application.
I have a table which is like below
CREATE TABLE `post` (
`id` bigint(11) unsigned NOT NULL AUTO_INCREMENT,
`ref` varchar(255) DEFAULT '',
`userId` bigint(20) NOT NULL,
`text` varchar(255),
`count` bigint(20) NOT NULL,
`version` bigint(11) DEFAULT NULL
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
In this table, I have a column count which depends on the combination of unique ref, userId and text columns eg. if the combination of these columns is not present in DB then the count will be 1 but if the combination already exists in DB, the count value will be 0.
I run into the problem, when the two or more users are trying to post with same ref, userId and text at the same time. Out of these request only one should get count as one and other ones as zero.
How can I handle this case when multiple users are trying to post with same values?

Related

How to avoid duplicate key error in mysql

I have a problem to get a next sequence id in my code. Though it was a legacy code i have to follow the same. Let me explain the logic which was followed.
CREATE TABLE `emp_seq` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1234 DEFAULT CHARSET=utf8
Above table used to get the next sequence id for the below table. and also emp_seq table will have only one entry for the id.
CREATE TABLE `emp_info` (
`id` BIGINT(8) UNSIGNED NOT NULL,
`name` VARCHAR(128) DEFAULT '',
`active` TINYINT(2) DEFAULT '1',
`level` MEDIUMINT(8) DEFAULT '100',
PRIMARY KEY (`id`),
KEY `level` (`level`)
) ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='employee information'
so whenever we trying to insert a new record to emp_info table, we are getting next sequence id from the table emp_seq by using below queries.
INSERT INTO emp_seq () VALUES ();
DELETE FROM emp_seq WHERE id < LAST_INSERT_ID;
Now the problem is, some times because of multiple asynchronous calls in the application, the same increment id has been shared to multiple records and trying to insert in the emp_info table and we are getting below error
"code":"ER_DUP_ENTRY","errno":1062,"sqlMessage":"Duplicate entry 1234 for key
Kindly help me how to resolve the issue.

Table "Products" with predefined products, user can customize the price. How to avoid data redundancy?

I've been thinking on this problem for fews days and I still can't find a way to do what I want.
Below is how my database is currently designed (it's where I'm stuck) :
This is what I want :
a User can create multiple PriceSheets. A User can give a PriceSheet any name he wants. There are two PriceSheets types : "Lab Fulfillment", or "Self Fulfillment".
if the User chooses "Lab Fulfillment", he can import all or part of the Products of one of the predefined Labs. (I rephrase : there are few Labs that come with a predefined list of Products). The User will only be able to customize the price. He can't add custom products to this PriceSheet.
if the User chooses "Self Fulfillment", he can add his own products, and can personalize each field (name, cost, price, dimension_h, dimension_l).
I don't know how to link the tables between them. If I put the predefined Products in the Products table and set a Many-to-Many relationship between PriceSheets and Product, the default price of a predefined Product will be overwritten when a User customizes it, which is not what I want.
Also, I want the default values of my predefined Products to be only once in my database. If 100 users uses the predefined Products, I don't want the default cost to be in my database 100 times.
Don't hesitate to ask for precisions, I had trouble making this question clear and I think it's still not totaly clear.
Thanks in advance for your help
OK, database normalization 101. Lots of ways to do this, would take me a day to really optimize all this, this should help:
User
Lab
Product
id name cost dimension .....
1 a
2 b
3 c
4 d
So those three tables are fine. All your products will go in the Product table. No foreign keys in any of those tables.
PriceSheet
user_id custom_price product_id type
1 1.99 1 lab-fulfillment
0 NULL 2 self-fulfillment
1 5.99 3 lab-fulfillment
So a user can have as many price sheets as they want, and they can only adjust the price of a product. This can actually be normalized further if you so wish:
PriceSheet (composite key on id, user_id, FK user_id)
id user_id
0 0
1 1
2 1
LabPriceSheet (you could add an id, might be better, or you could use a composite key, stricter)
PriceSheet_id custom_price lab_product_id
0 1.99 0
2 5.99 1
CustomPriceSheet
PriceSheet_id custom_product_id
1 0
With foreign keys as appropriate. This now makes MySQL restrict the custom_price, rather than in PHP (although you would still have to deal with ensuring correct INSERT!).
Now, to deal with who adds the products:
CustomProduct
id user_id product_id timestamp
0 3 2 ...
LabProduct
id lab_id product_id timestamp
0 0 1 ...
1 0 3 ...
So let's double check:
This is what I want :
a User can create multiple PriceSheets. check A User can give a PriceSheet
any name he wants. check There are two PriceSheets types : "Lab
Fulfillment", or "Self Fulfillment". check
if the User chooses "Lab Fulfillment", he can import all or part of the Products of one of the predefined Labs. (I rephrase : there are few Labs that come with a predefined list of Products). The User will only be able to customize the price. He can't add custom products to this PriceSheet.
Yup, because he would create a LabPriceSheet that can only add lab_product_id. Custom price is there too, that overrides the default price in product table.
if the User chooses "Self Fulfillment", he can add his own products, and can personalize each field (name, cost, price, dimension_h, dimension_l).
Yup, he would add a product (you would need to check if a similar one exists, else return the id of the existing product in the product table), and then that would also be an entry in CustomProduct.
I don't know how to link the tables between them. If I put the predefined Products in the Products table and set a Many-to-Many relationship between PriceSheets and Product, the default price of a predefined Product will be overwritten when a User customizes it, which is not what I want.
Yeah that won't happen :) Never (very very rarely) implement many-many rels.
Also, I want the default values of my predefined Products to be only
once in my database. If 100 users uses the predefined Products, I
don't want the default cost to be in my database 100 times.
Of course.
Let me know if you want the MySQL code, I assume you're good! Remember to use InnoDB and properly configure your MySQL configuration!
EDIT
I felt like helping you out with a copy and paste thing. I like copy and paste things. Also, there's a redundant user_id column in the blurb above which I fixed in an earlier edit.
SET GLOBAL innodb_file_per_table = 1;
SET GLOBAL general_log = 'OFF';
SET FOREIGN_KEY_CHECKS=1;
SET GLOBAL character_set_server = utf8mb4;
SET NAMES utf8mb4;
CREATE DATABASE SO; USE SO;
ALTER DATABASE SO CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
CREATE TABLE `User` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`email` VARCHAR(555) NOT NULL,
`password` VARBINARY(200) NOT NULL,
`username` VARCHAR(100) NOT NULL,
`role` INT(2) NOT NULL,
`active` TINYINT(1) NOT NULL,
`created` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` DATETIME ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `Lab` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(1000) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `Product` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(1000) NOT NULL,
`password` VARBINARY(200) NOT NULL,
`cost` DECIMAL(10, 2) NOT NULL,
`price` DECIMAL(10, 2) NOT NULL,
`height` DECIMAL(15, 5) NOT NULL,
`length` DECIMAL(15, 5) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `CustomProduct` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`user` BIGINT(20) UNSIGNED NOT NULL,
`product` BIGINT(20) UNSIGNED NOT NULL,
`created` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
FOREIGN KEY (`user`) REFERENCES `User`(`id`),
FOREIGN KEY (`product`) REFERENCES `Product`(`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `LabProduct` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`lab` BIGINT(20) UNSIGNED NOT NULL,
`product` BIGINT(20) UNSIGNED NOT NULL,
`created` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
FOREIGN KEY (`lab`) REFERENCES `Lab`(`id`),
FOREIGN KEY (`product`) REFERENCES `Product`(`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `PriceSheet` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(1000) NOT NULL,
`user` BIGINT(20) UNSIGNED NOT NULL,
PRIMARY KEY (`id`,`user`),
FOREIGN KEY (`user`) REFERENCES `User`(`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `LabPriceSheet` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`price_sheet` BIGINT(20) UNSIGNED NOT NULL,
`lab_product` BIGINT(20) UNSIGNED NOT NULL,
`custom_price` DECIMAL(10, 2) NOT NULL,
PRIMARY KEY (`id`),
FOREIGN KEY (`price_sheet`) REFERENCES `PriceSheet`(`id`),
FOREIGN KEY (`lab_product`) REFERENCES `LabProduct`(`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `CustomPriceSheet` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`price_sheet` BIGINT(20) UNSIGNED NOT NULL,
`custom_product` BIGINT(20) UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
FOREIGN KEY (`price_sheet`) REFERENCES `PriceSheet`(`id`),
FOREIGN KEY (`custom_product`) REFERENCES `CustomProduct`(`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Selecting columns messes up order of rows

I have 3 tables blog_articles, blog_tags and blog_articles_tags. Pretty basic stuff, a blog where articles can have tags.
CREATE TABLE `blog_articles` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`body` text NOT NULL,
`datetime` datetime NOT NULL,
`author` int(10) unsigned DEFAULT NULL,
`published` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `author` (`author`),
FULLTEXT KEY `title` (`title`,`body`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;
CREATE TABLE `blog_articles_tags` (
`article` int(10) unsigned NOT NULL,
`tag` int(10) unsigned NOT NULL,
PRIMARY KEY (`article`,`tag`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `blog_tags` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`description` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;
Here is a working example, selecting blog posts with their tags and author
But if I switch the sort from descending to ascending, I get messed up results
However if I remove the columns of the blog_tags table from the list of columns to select, the order is correct
There are two questions I would like to ask:
Why is the sequence of the rows altered by the columns that are selected?
How can I prevent this without modifying the SQL statement anywhere outside of the inner-most query?
I cannot modify the SQL statement because that is an automatically generated statement and I can not determine (easily if at all) what the sort will be and if any successive columns added to the select clause will alter the results even further.
Queries in SQL are unordered unless you specify an ORDER BY. Any ordering you happen to get is an artifact of the implementation and cannot be relied upon.
You have an ORDER BY in one of your virtual tables, but nothing for the outer query. Since it includes several joins you can't even count on MySQL coincidentally preserving the order of the virtual table.
There's little reason to put an ORDER BY on a sub-query unless you're doing something advanced like adding a row-number. So the whole sub-query can be dropped and just join on blog_articles. Then you can ORDER BY blog_articles.id.

How to optimize this mysql join on large table?

I have a project where the admin needs to create multiple newsletters with some crawled posts from the web.
I insert the posts in posts table after crawling has completed and assign them a feed_id to identify the source. this is the structure of posts table (truncated):
CREATE TABLE `posts` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`feed_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`identifier` varchar(255) DEFAULT NULL,
`published` timestamp NULL DEFAULT NULL,
`content` longtext,
...
...
`is_unread` int(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Every admin (user) has access to one or more "feeds". So in Newsletter creation page I want to show them a list of posts from the feeds they are allowed to see and also, I show a button to put the posts in specifict categories of that newsletter, if the user previously selected that post, I should show him that and let him remove it from the category. So I have some other tables too: newsletters, categories, newsletter_post, category_post. Here is their structures:
newsletters:
CREATE TABLE `newsletters` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`sent_at` timestamp NULL DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`date` date DEFAULT NULL,
`topic_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
categories:
CREATE TABLE `categories` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`topic_id` int(11) NOT NULL,
`title` varchar(255) DEFAULT NULL,
`slug` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
newsletter_post:
CREATE TABLE `newsletter_post` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`newsletter_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
category_post:
CREATE TABLE `category_post` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`category_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
So I'm using this query to find posts for the allowed feeds and check the status if a post is in a specific category of this specific newsletter:
SELECT DISTINCT `posts`.`id`, `published`, `posts`.`title`, `posts`.`content`, `source_name`, `category_id`, `newsletter_id`, `link_href`, categories.title as category_title
FROM `posts`
LEFT JOIN `category_post` ON `posts`.`id` = `category_post`.`post_id`
LEFT JOIN `categories` ON `categories`.`id` = `category_post`.`category_id`
LEFT JOIN `newsletter_post` ON `posts`.`id` = `newsletter_post`.`post_id`
LEFT JOIN `newsletters` ON `newsletters`.`id` = `newsletter_post`.`newsletter_id`
WHERE `feed_id` IN (6, 7) ORDER BY `posts`.`published` DESC LIMIT 40 OFFSET 0
but the problem is this is horrible and not optimized. My posts table contains up to 50,000 rows each month, and each row with 3~10kbs of data in avg., so sometimes when I try to run the query (which is frequently run by the admin to make the newsletter, pagination etc) mysql shows this error: too much rows to join, etc. and most of the times its really slow.
and the reason I'm doing all this in one query is because I want the result to be in one json response so I can show them the user quickly without doing additional requests.
I wanna know if there is a better way to do this query or use indexes or something else.
Thanks you in advance for your help.
index your posts table on
( feed_id, published )
so the data is already optimized for your WHERE clause, and pre-sorted to help your ORDER BY.
For reading querys that have a lot of demand, InnoDB is very inefficient. I recommend you to use a NoSQL Database but if you don't want or the cost of change is too much... you can try this:
1) LIKE Sallar Kaboli told you, you have to index your tables in columns that use in JOIN querys. For example:
CREATE INDEX index1 ON newsletter_post (post_id);
2) USE only important columns for JOINS.
I mean, you have to only use the columns that use in SELECT part of query.
I hope this'd be helpful.
To complete other answers, I suggest to change this types on posts table:
1) Change feed_id to int(4). Really you have more than int(4) feeds?
2) Change is_unread to bit instead of int(1). I should say that this may not improve your given query in the question but according to the field name, the correct type is bit.
Another more improvement to this answer is that never use default int(11) for numeric or id fields, assign types more specific. Using smaller size of types will improve your indexes also. I don't think you need more than int(4) for fields id.
For example indexing and querying int(3) column is more faster than int(11).
Please create the following indexes indexes on ::
1) `post_id` in `category_post`
2) `post_id` in `newsletter_post`

How to apply the MINUS efficiently on mysql query for tables with large data

I have 2 tables as the following -
CREATE TABLE IF NOT EXISTS `nl_members` (
`member_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`member_email` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`member_confirmation_code` varchar(35) COLLATE utf8_unicode_ci NOT NULL,
`member_enabled` enum('Yes','No') COLLATE utf8_unicode_ci NOT NULL DEFAULT 'Yes',
PRIMARY KEY (`member_id`),
UNIQUE KEY `TUC_nl_members_1` (`member_email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=58520 ;
CREATE TABLE IF NOT EXISTS `nl_member_group_xref` (
`group_id` int(10) unsigned NOT NULL,
`member_id` int(10) unsigned NOT NULL,
`member_subscribed` enum('Yes','No') COLLATE utf8_unicode_ci NOT NULL DEFAULT 'Yes',
`subscribe_date` int(10) unsigned NOT NULL DEFAULT '0',
`unsubscribe_date` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`group_id`,`member_id`),
KEY `nl_members_nl_member_group_xref` (`member_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
ALTER TABLE `nl_member_group_xref`
ADD CONSTRAINT `nl_members_nl_member_group_xref` FOREIGN KEY (`member_id`) REFERENCES `nl_members` (`member_id`),
ADD CONSTRAINT `nl_member_groups_nl_member_group_xref` FOREIGN KEY (`group_id`) REFERENCES `nl_member_groups` (`group_id`);
Both has quite some large amount of data about millions of them.
What i want is to have an efficient was of applying the MINUS on result set.
For example,
i want to get all the users from Group1 with ID: 1 MINUS all users from Group2 with ID: 2 and Group3 with ID: 3
How can i do it efficiently? with the query running as fast as possible.
Update
What i want is like this -
in members table 'nl_members' i keep a list of all members, who could have been associated with one or more groups.
for each group association for a member there will be a row in the 'nl_member_group_xref' table.
so if a member is associated with 3 groups there will be 3 entries in the member_group_xref table.
Now what i want is to get all members included in group 1 but exclude members if they also belong to group 2 and group 3.
Hope this helps.
For your updated question you will need to join the two tables and group it with members_id: See below query if will display the result your looking for.
UPDATED:
SELECT
nm.*, nmgx.*
FROM nl_members nm
INNER JOIN nl_member_group_xref nmgx
ON nm.member_id = nmgx.member_id
LEFT JOIN (SELECT
nmgx2.member_id
FROM nl_member_group_xref nmgx2
WHERE nmgx2.group_id <> 1) nmgx22
ON nmgx22.member_id = nm.member_id
WHERE nmgx22.member_id IS NULL
GROUP BY nm.member_id;
Note: I used * to get all the field name. You get specific field so the query will be more faster as it only get less results. Ex. member_id like nm.member_id
If this is not what you looking for, just inform me then I'll update this query as accurate as I can
Have you tried using the MINUS operator?