MySQL: is the order of a multi-INSERT guaranteed? - mysql

I want to know if I do something like this:
INSERT INTO
projects(name, start_date, end_date)
VALUES
('AI for Marketing','2019-08-01','2019-12-31'),
('ML for Sales','2019-05-15','2019-11-20');
into a table
CREATE TABLE projects(
project_id INT AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
start_date DATE,
end_date DATE,
PRIMARY KEY(project_id)
);
if the project_id of the second entry ('ML for Sales','2019-05-15','2019-11-20') will always be bigger (inserted after the other one) than the first? It's not about if the ids are +1 - just a > b.
So when I do a SELECT project_id, name FROM projects ORDER BY project_id it will always be:
-------------------------------------
| project_id | name |
|-----------------------------------|
| 1 | AI for Marketing |
| 1 + x | ML for Sales |
-------------------------------------
example taken from here: https://www.mysqltutorial.org/mysql-insert-multiple-rows/

Yes, its guaranteed that a > b.
b can never be inserted before a in at least Mysql.

You can look up to the documentation of VALUES and AUTO_INCREMENT
https://dev.mysql.com/doc/refman/8.0/en/values.html
https://dev.mysql.com/doc/refman/8.0/en/example-auto-increment.html
There is no explicit mention about the order but lets assume that on any existing language, the creation of elements on an array is sequential by essence.

Related

Creating Primary key from 2 autonumber and constant letter when creating table

I am new to MYSQL and would like to create a table where a constant Letter depicting the department is added to an auto increment number. This way I would be able to identify the category of the worker upon viewing the ID.
Ex. Dept A and employee 135. The ID I am imaging should read A135 or something similar. I have created the table, the auto increment works fine, the constant letter has been declared and is featuring. However I would like to concatenate them in order to use the A135 as a primary key.
Any Help Please?
This quite tricky, and you would be probably better off doing manual concatenation in a select query.
But since you asked for it...
In normal usage you would have used a computed column for this, but they do not support using autoincremented columns in their declaration. So you would need to use triggers:
on insert, query information_schema.tables to retrieve the autoincremented id that is about to be assigned and use it to generate the custom id
on update, reset the custom id
Consider the following table structure:
create table workers (
id int auto_increment primary key,
name varchar(50) not null,
dept varchar(1) not null,
custom_id varchar(12)
);
Here is the trigger for insert:
delimiter //
create trigger trg_workers_insert before insert ON workers
for each row
begin
if new.custom_id is null then
select auto_increment into #nextid
from information_schema.tables
where table_name = 'workers' and table_schema = database();
set new.custom_id = CONCAT(new.dept, lpad(#nextid, 11, 0));
end if;
end
//
delimiter ;
And the trigger for update:
delimiter //
create trigger trg_workers_update before update ON workers
for each row
begin
if new.dept is not null then
set new.custom_id = CONCAT(new.dept, lpad(old.id, 11, 0));
end if;
end
//
delimiter ;
Let's run a couple of inserts for testing:
insert into workers (dept, name) values ('A', 'John');
insert into workers (dept, name) values ('B', 'Jim');
select * from workers;
| id | name | dept | custom_id |
| --- | ---- | ---- | ------------ |
| 1 | John | A | A00000000001 |
| 2 | Jim | B | B00000000002 |
And let's test the update trigger
update workers set dept = 'C' where name = 'Jim';
select * from workers;
| id | name | dept | custom_id |
| --- | ---- | ---- | ------------ |
| 1 | John | A | A00000000001 |
| 2 | Jim | C | C00000000002 |
Demo on DB Fiddle
Sorry, my answer does not fit in a comment.
I agree with #GMB.
This is a tricky situation and in some cases (selects mainly) will lead in a performance risk due you'll have to split PK in where statements, which is not recommended.
Having a column for department and another for auto_increment is more logical. And the only gap you have is to know the number of employees per department you'll have to make a count grouping by dept. Instead of a max() splitting your concatenated PK, which is is at high performance cost.
Let atomic and logic data remain in separate columns. I would suggest to create a third column with the concatenated value.
If, for some company reason, you need B1 and A1 values for employees of different departments, I'd suggest to have 3 columns
Col1 - letter(not null)
Col2 - ID(Not auto-increment, but calculated as #GMB's solution) (Not NULL)
Col3 - Concatenation of Col1 and Col2 (not null)
PK( Col1, col2)

How to set up table keys with 15M+ rows for high performance and low cost?

I need to ensure best performance for a table with 15M+ rows in a MySQL database hosted in AWS using Aurora (Small sized instance currently). The table is essentially for tracking the ownership and update timestamp of product units over time, along with each unit's other basic information like serial number.
The columns are as follows:
UnitId, ScanTime, Model, SerialNumber, MfrTimestamp, UpdateTimestamp,
CustomerId
Table Creation Statement
CREATE TABLE `UnitHistory` (
`UnitId` bigint(20) NOT NULL,
`ScanTime` int(11) NOT NULL,
`Model` bigint(20) NOT NULL,
`SerialNumber` int(11) NOT NULL,
`MfrTimestamp` int(11) NOT NULL,
`UpdateTimestamp` int(11) DEFAULT NULL,
`CustomerId` bigint(20) DEFAULT NULL,
PRIMARY KEY (`UnitId`,`ScanTime`)
);
Rows will be added over time, but NEVER modified.
I chose UnitId and ScanTime as the primary key because those two together are sufficient to always be unique.
Query 1
The query I'll most frequently use will ideally produce a list of all UnitId's for a specific Model, along with the unit's most up-to-date details.
The following query will work, but will of course also return more rows than I need (redundant data):
SELECT UnitId, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId FROM UnitHistory WHERE Model=2500;
If there's a way to constrain that query so that only the row with the most recent ScanTime is returned for any given UnitId, that would be ideal.
Otherwise I'll simply search the result for the row with the most recent ScanTime for each UnitId afterward.
Query 2
The other very frequently used query will produce a basic set of details and history for any particular unit, like this:
SELECT ScanTime, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId FROM UnitHistory WHERE UnitId=1234567;
This query will primarily be used to track the change of ownership as it passes from the manufacturer to a customer, then
back to the manufacturer for update, then out to perhaps a different customer, etc.
Summary
With the above scenario, what additional key(s) should I have in order to ensure good performance and low cost?
One cost factor is that I assume my working set should fit within RAM in order to avoid lots of IOs since AWS charges for IOs.
My current database instance has 2 GB RAM, and for cost reasons I don't want to upgrade it.
For your query 1, you should have this index:
ALTER TABLE UnitHistory ADD INDEX (Model, ScanTime);
To get the most recent:
SELECT UnitId, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId
FROM UnitHistory WHERE Model=2500
ORDER BY ScanTime DESC LIMIT 1;
Here's a demo of using EXPLAIN to confirm the query uses the index (which is named "Model" after the first column of the index since I didn't give it a name in my test):
mysql> explain SELECT UnitId, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId FROM UnitHistory WHERE Model=2500 order by scantime desc limit 1;
+----+-------------+-------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | UnitHistory | NULL | ref | Model | Model | 8 | const | 1 | 100.00 | Using where |
+----+-------------+-------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
Your other query 1 is already searching by the left-most column of the primary key, so there's no need to add another index.
mysql> explain SELECT ScanTime, SerialNumber, MfrTimestamp, UpdateTimestamp, CustomerId FROM UnitHistory WHERE UnitId=1234567;
+----+-------------+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | UnitHistory | NULL | ref | PRIMARY | PRIMARY | 8 | const | 1 | 100.00 | NULL |
+----+-------------+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------+
I can't predict whether your working set will fit in RAM, because I don't know the distribution of your data.
I assume this is an audit table and you are taking readings for units?
Partitioning tables, having views or prepared statements are some possible ways.
Here another way for Query1. Create another table like your UnitHistory. Create table UnitReadings like UnitHistory; but unitid being the primary key.
And then alter you UnitHistory table and add triggers before insert or after insert. something like,
Insert into `UnitReading`(
UnitId,
ScanTime,
Model,
SerialNumber,
MfrTimestamp,
UpdateTimestamp,
CustomerId
) values
(
NEW.UnitId,
NEW.ScanTime,
NEW.Model,
NEW.SerialNumber,
NEW.MfrTimestamp,
NEW.UpdateTimestamp,
NEW.CustomerId
) ON DUPLICATE KEY UPDATE
ScanTime = values(ScanTime),
Model = values(Model),
SerialNumber = values(SerialNumber),
MfrTimestamp = values(MfrTimestamp),
UpdateTimestamp = values(UpdateTimestamp),
CustomerId = values(CustomerId);
The goal ist to keep the latest reading in a "header table" which might have less rows than your entire history of your (readings * per day * days) rows. After few years you might exceed 15m rows but your header table could still be around 1000 units or whatever amount of units you are taking readings of. You may well exceed your performance expectation usign this header table "withing your 2GB RAM" :) :)
Not sure if you can implement this but you get the idea right?

SQL result table, match in second table SET type

The following two tables are not liked by any type of constraint.
First i have a table called subscription_plans that looks like this:
name | price | ID
-------------------
plan_A | 9.99 | 1
Plan_B | 19.99 | 2
plan_C | 29.99 | 3
I have a second table called pricing_offers. The subscription_plan_ID is a of type SET and can only contain values that match the ID's of the subscription_plans.ID (column from the above table). This table looks like this:
p_o_name | subscription_plan_ID | ID
-----------------------------------------
free donuts | 1 | 1
extra sauce | 1,2,3 | 2
pony ride | 3 | 3
bus fare -50% | 1,2,3 | 4
I'm trying to do a query to select everything (all fields *) from the first table and all names from the second table and the resulting rows should look like this:
name | price | p_o_name | ID
-------------------------------------------------------------
plan_A | 9.99 | free donuts, extra sauce, bus fare -50% | 1
Plan_B | 19.99 | extra_sauce, bus fare -50% | 2
plan_C | 29.99 | extra_sauce, pony ride, bus fare -50% | 3
The idea being that it should, for each row in the subscription_plans table, look ID field. Then go trough the second table and see what rows contain in the subscription_plan_ID, the ID of the row above. Gather those into a field caller p_o_name and insert its values to the matching response rows.
I tried doing this:
SELECT subscription_plans.*, pricing_offers.name
FROM subscription_plans INNER JOIN pricing_offers ON
FIND_IN_SET(subscription_plans.ID,subscription_plan_ID)
but i get instead of:
plan_A | 9.99 | free donuts, extra sauce, bus fare -50% | 1
this:
plan_A | 9.99 | free donuts | 1
plan_A | 9.99 | extra sauce | 1
plan_A | 9.99 | bus fare -50% | 1
Note: i get a response with all rows, but i just put the first one here to exemplify the difference.
Now, while i could do the processing in the response on my PHP page, i'm interested in knowing if i get the DB engine to output my desired result.
Do i need to create a type of constraint between the tables? If so how would i do it? I would be grateful for any help that would help me get to my proffered output result (even a better title for the question!).
If there are any unclear points, please let me know and i will clarify them.
Example of junction/intersect table usage.
create table subscription_plans
(
id int not null auto_increment primary key, -- common practice
name varchar(40) not null,
description varchar(255) not null,
price decimal(12,2) not null
-- additional indexes:
);
create table pricing_offers
(
id int not null auto_increment primary key, -- common practice
name varchar(40) not null,
description varchar(255) not null
-- additional indexes:
);
create table so_junction
( -- intersects mapping subscription_plans and pricing_offers
id int not null auto_increment primary key, -- common practice
subId int not null,
offerId int not null,
-- row cannot be inserted/updated if subId does not exist in parent table
-- the fk name is completely made up
-- parent row cannot be deleted and thus orphaning children
CONSTRAINT fk_soj_subplans
FOREIGN KEY (subId)
REFERENCES subscription_plans(id),
-- row cannot be inserted/updated if offerId does not exist in parent table
-- the fk name is completely made up
-- parent row cannot be deleted and thus orphaning children
CONSTRAINT fk_soj_priceoffer
FOREIGN KEY (offerId)
REFERENCES pricing_offers(id),
-- the below allows for only ONE combo of subId,offerId
CONSTRAINT soj_unique_ids unique (subId,offerId)
-- additional indexes:
);
insert into subscription_plans (name,description,price) values ('plan_A','description',9.99);
insert into subscription_plans (name,description,price) values ('plan_B','description',19.99);
insert into subscription_plans (name,description,price) values ('plan_C','description',29.99);
select * from subscription_plans;
insert into pricing_offers (name,description) values ('free donuts','you get free donuts, limit 3');
insert into pricing_offers (name,description) values ('extra sauce','extra sauce');
insert into pricing_offers (name,description) values ('poney ride','Free ride on Wilbur');
insert into pricing_offers (name,description) values ('bus fare -50%','domestic less 50');
select * from pricing_offers;
insert so_junction(subId,offerId) values (1,1); -- free donuts to plans
insert so_junction(subId,offerId) values (1,2),(2,2),(3,2); -- extra sauce to plans
insert so_junction(subId,offerId) values (3,3); -- wilbur
insert so_junction(subId,offerId) values (1,4),(2,4),(3,4); -- bus to plans
select * from so_junction;
-- try to add another of like above to so_junction
-- Error Code 1062: Duplicate entry
-- show joins of all
select s.*,p.*
from subscription_plans s
join so_junction so
on so.subId=s.id
join pricing_offers p
on p.id=so.offerId
order by s.name,p.name
-- show extra sauce intersects
select s.*,p.*
from subscription_plans s
join so_junction so
on so.subId=s.id
join pricing_offers p
on p.id=so.offerId
where p.name='extra sauce'
order by s.name,p.name
Basically you insert and delete from the junction table (no good really updating ever in this example).
Clean and fast joins without having to mess with slow, unwieldy sets without indexes
No one can ride the Wilbur the Poney anymore? Then
delete from so_junction
where offerId in (select id from pricing_offers where name='poney ride')
Ask if you have any questions.
And good luck!

How to get sum of records based on another column?

I have the following table:
JobCode | Designation | SalaryWithIncrement
----------------------------------------------
JC001 | IT | 150,000
JC001 | IT | 155,000
JC002 | Sales | 100,000
JC003 | HR | 200,000
JC003 | HR | 210,000
JC003 | HR | 220,000
Required output:
JobCode | Designation | SalaryWithIncrement
------------------------------------------------
JC001 | IT | 305,000
JC002 | Sales | 100,000
JC003 | HR | 630,000
Below is the code I used, but I don't get grand total after grouping
SELECT JobCode, designation, salaryWithIncrement
FROM table1
group by (JobCode)
Any help is appreciated.
You can use the aggregate sum function:
SELECT JobCode, designation, SUM(salaryWithIncrement)
FROM table1
GROUP BY JobCode, designation
In most cases when you have such requirement, Implement using GroupBY with SQL Aggregate function
Group the Fields based on what you want records, in youu case JobCode and designation
You can learn about Group BY here MSDN
SELECT JobCode,designation,SUM(salaryWithIncrement)
FROM Job GROUP BY JobCode,designation
Here is your sample working code SQL Fiddle
For this, you have to use the sum function in a group by statement.
select jobCode, designation, sum(salaryWithInc) from Job group by jobCode;
Check this link to see a working example.
In my opinion, you have to restructure differently your table to avoid the data redundancy (jobCode and designation). For this you can have two tables, one with the jobCode and designation and the other with the salaryWithInc.
create table Job (jobId int auto_increment, jobCode varchar(5), designation varchar(25), primary key (jobId));
create table Salary (job int, salaryWithInc decimal(5,2), foreign key (job) references Job (jobId));
insert into Job (jobCode, designation) values ("JC001", "IT"),("JC002", "Sales"), ("JC003", "HR");
insert into Salary values (1,150.00), (1,155.00),(2,100.00),(3,200.00),(3,210.00),(3,220.00);
In this case, you use this query to get the required result:
select J.jobCode, J.designation, sum(S.salaryWithInc) from Job as J join Salary as S on J.jobId=S.job group by (J.jobId);
Check this link to see a working example.
Hope it's useful!

MySQL schema design issues - Normalizing

I'm creating tables for my site using the following design(s)
Design 1
Design 2
Since not every user who register will try the challenge, Design 1 is suited. On insert into third table, table 2 score is updated accordingly. But the user_id field becomes redundant.
Either 0 or NULL values are set for every user in design 2 which still isn't normalized.
What would be the optimal design and how important is normalization or key in an organization?
Edit:
For future people - I had some problems understanding what OP was asking for so read through the comments if you get a little lost. Ultimately, they were looking to store aggregate data and didn't know where to put it or how to make it happen. The solution is basically to use an insert trigger, which is explained near the end of this post.
I chose to just add another column on to the user table to store the accumulated sum of user_problem.score. However, making a new table (with the columns user_id and total_sum) isn't a bad option at all even though it seems to be an excessive use of normalization. Sometimes it is good to keep data that is constantly updated separate from data that is rarely changed. That way if something goes wrong, you know your static data will be safe.
Something else I never touched on are the data concurrency and integrity issues associated with storing aggregate data in general... so beware of that.
I would suggest something like this:
User Table
User_ID - Email - Name - Password - FB_ID
-- holds all the user information
Problem Table
Problem_ID - Problem_Title - Problem_Descr
-- holds all the info on the individual challenges/problems/whatever
User_Problem Table
User_Problem_ID - User_ID - Problem_ID - Score - Completion_Date
-- Joins the User and Problem tables and has information specific
-- to a user+challenge pair
And this assumes that a user can take many challenges/problems. And one problem/challenge can be taken by several users.
To see all the problems by a certain user, you would do something like:
select user.user_id,
user.name,
problem_title,
problem_descr,
user_problem.score,
user_problem.completed_date
from user
join user_problem on user.user_id = user_problem.user_id
join problem on user_problem.problem_id = problem.problem_id
where user.user_id = 123 or user.email = 'stuff#gmail.com'
The lengths for the varchar fields are fairly generic...
create table User(
User_ID int unsigned auto_increment primary key,
Email varchar(100),
Name varchar(100),
Password varchar(100),
FB_ID int
);
create table Problem (
Problem_ID int unsigned auto_increment primary key,
Problem_Title varchar(100),
Problem_Descr varchar(500)
);
create table User_Problem (
User_Problem_ID int unsigned auto_increment primary key,
User_ID int unsigned,
Problem_ID int unsigned,
Score int,
Completion_Date datetime,
foreign key (User_ID) references User (User_ID),
foreign key (Problem_ID) references Problem (Problem_ID)
);
After our conversation from down below in the comments... you would add a column to user:
User Table
User_ID - Email - Name - Password - FB_ID - Total_Score
I gave the column a default value of 0 because you seemed to want/need that if the person didn't have any associated problem/challenges. Depending on other things, it may benefit you to make this an unsigned int if you have a rule which states there will never be a negative score.
alter table user add column Total_Score int default 0;
then... you would use an insert trigger on the user_problem table that affects the user table.
CREATE TRIGGER tgr_update_total_score
AFTER INSERT ON User_Problem
FOR EACH ROW
UPDATE User
SET Total_score = Total_score + New.Score
WHERE User_ID = NEW.User_ID;
So... after a row is added to User_Problem, you would add the new score to user.total_score...
mysql> select * from user;
+---------+-------+------+----------+-------+-------------+
| User_ID | Email | Name | Password | FB_ID | Total_Score |
+---------+-------+------+----------+-------+-------------+
| 1 | NULL | kim | NULL | NULL | 0 |
| 2 | NULL | kyle | NULL | NULL | 0 |
+---------+-------+------+----------+-------+-------------+
2 rows in set (0.00 sec)
mysql> insert into user_problem values (null,1,1,10,now());
Query OK, 1 row affected (0.16 sec)
mysql> select * from user;
+---------+-------+------+----------+-------+-------------+
| User_ID | Email | Name | Password | FB_ID | Total_Score |
+---------+-------+------+----------+-------+-------------+
| 1 | NULL | kim | NULL | NULL | 10 |
| 2 | NULL | kyle | NULL | NULL | 0 |
+---------+-------+------+----------+-------+-------------+
2 rows in set (0.00 sec)
mysql> select * from user_problem;
+-----------------+---------+------------+-------+---------------------+
| User_Problem_ID | User_ID | Problem_ID | Score | Completion_Date |
+-----------------+---------+------------+-------+---------------------+
| 1 | 1 | 1 | 10 | 2013-11-03 11:31:53 |
+-----------------+---------+------------+-------+---------------------+
1 row in set (0.00 sec)