MySQL Create a function table? - mysql

I am trying to design the layout of the table to work best in the following situation.
I have a product that is sold based on age. The age determines if that product exists for this person and the minimum and maximum one can buy.
Right now i have designed the table as follows:
CREATE TABLE `tblProductsVsAge` (
`id` int(255) AUTO_INCREMENT NOT NULL,
`product_id` bigint(255) NOT NULL,
`age_min` int(255) NOT NULL,
`age_max` int(255) NOT NULL,
`quantity_min` decimal(8) NOT NULL,
`quantity_max` decimal(8) NOT NULL,
/* Keys */
PRIMARY KEY (`id`)
) ENGINE = InnoDB;
this is functional and it work, but i feel as if its not the best optimized structure.
any idea?
i forgot to mention a product can have many ranges. for example age min 25 age max 35 and the quantity for this would be 12 and 28, for the same product ID we might have age 36 to 60, quantity from 3 to 8.

Use tinyint unsigned for age_max and age_min since none of the ages in the question pass 255 (highest unsigned tinyint).
Use smallint unsigned for quantity_max and quantity_min if those values > 255 and <= 65535 (highest unsigned smallint).
Use mediumint unsigned for quantity_max and quantity_min if those values > 65535 and <= 16777215 (highest unsigned mediumint).
Use int unsigned for quantity_max and quantity_min if those values > 16777215 and <= 4294967295 (highest unsigned int). (Sometimes, you gotta Think Big !!!)
My recommendation:
CREATE TABLE `tblProductsVsAge` (
`product_id` int NOT NULL,
`age_min` tinyint unsigned NOT NULL,
`age_max` tinyint unsigned NOT NULL,
`quantity_min` smallint unsigned NOT NULL,
`quantity_max` smallint unsigned NOT NULL,
/* Keys */
PRIMARY KEY (`product_id`, `age_min`)
) ENGINE = InnoDB;
Here is something to consider if the table already has data: You could ask mysql to recommend column defintions for this table.
Simply run this query:
SELECT * FROM tblProductsVsAge PROCEDURE ANALYSE();
The directive PROCEDURE ANALYSE() will cause mysql not to display the data but to examine the values from each column and come up with its own recommendation. Sometimes, the recommendation is too granular. For example, if age_min is in the teenage range, it may recommend ENUM('13','14','15','16','17',18','19') instead of tinyint. After PROCEDURE ANALYSE() is done, you still make the final call on the column definitions.

CREATE TABLE `tblProductsVsAge` (
`product_id` int NOT NULL,
`age_min` smallint NOT NULL,
`age_max` smallint NOT NULL,
`quantity_min` smallint NOT NULL,
`quantity_max` smallint NOT NULL,
/* Keys */
PRIMARY KEY (`product_id`, `age_min`)
) ENGINE = InnoDB;
Changes to your structure:
id is probably not needed (unless you really need it), but then if you need product_id to be bigint then id should have the same type - after all this table can get more rows than your products table,
I changed type od product_id to int, I don't think you will have more than 2147483647 products,
age and quantity are smallints, which can have a maximum value of 32767 (use mediumint or int if it's not enough). decimal is intended for when you need exact precision or numbers bigger than bigint,
index on (id, age_min) to make faster searches for given product_id and for searches like product_id = {some_id} AND min_age > {user_age}
(255) in int/bigint definition doesn't make it 255 digits long - it's only a hint for string representation.
MySQL manual on numeric types: http://dev.mysql.com/doc/refman/5.5/en/numeric-types.html

Related

Historical big data slow queries

I have problem with slow queries.
PS
MariaDB: mariadb:10.3.25 - InnoDB
I optimized most DB configurations
Structure
create table customers
(
id bigint unsigned auto_increment
primary key,
email varchar(255) null,
full_name varchar(255) null,
country varchar(2) null,
first_name varchar(255) null,
second_name varchar(255) null,
company_name varchar(255) null,
gender char null,
birth_date date null,
state varchar(3) null,
null,
custom_field_1 varchar(255) null,
custom_field_2 varchar(255) null,
custom_field_3 varchar(255) null,
created_at timestamp null,
updated_at timestamp null,
deleted_at timestamp null
)
collate = utf8mb4_unicode_ci;
create table customer_daily_stats
(
date date not null,
campaign_id bigint not null,
customer_id bigint not null,
event_1 int unsigned default 0 not null,
event_2 int unsigned default 0 not null,
event_3 int unsigned default 0 not null,
event_4 int unsigned default 0 not null,
event_5 int unsigned default 0 not null,
constraint customer_daily_stats_date_customer_id_campaign_id_unique
unique (date, customer_id, campaign_id)
)
collate = utf8mb4_unicode_ci;
create index customer_daily_stats_customer_id_date_index
on customer_daily_stats (customer_id, date);
create index customer_daily_stats_campaign_id_index
on customer_daily_stats (campaign_id);
customers ~ 1 - 5 millions rows
customer_daily_stats ~ 1 - 100 millions rows
Queries
select
customers.*,
IFNULL(
SUM(events_aggregation.event_1),
0
) as event_1,
IFNULL(
SUM(events_aggregation.event_2),
0
) as event_2,
IFNULL(
SUM(events_aggregation.event_3),
0
) as event_3,
IFNULL(
SUM(events_aggregation.event_4),
0
) as event_4
from
`customers`
left join customer_daily_stats as events_aggregation on `customers`.`id` = `events_aggregation`.`customer_id`
and `events_aggregation`.`date` between '2021-09-06' and '2022-07-06'
group by
`customers`.`id`;
Problems
Main idea is to have possibility to get aggregation by any dates.
Problem is that works too slow now and i need to do addition aggregations which decrease performance. One more problem i don't have a lot of disc space (250G and about 80% used already).
I have:
customers ~ 1.5m
customer_daily_stats ~ 50.000
query speed ~ 5s
Questions
Is there any methods to optimize my DB or another tools?
Is there any DBs that help my to increase performance?
Change the indexes. You currently have
unique (date, customer_id, campaign_id)
INDEX(customer_id, date)
INDEX(campaign_id)
Maybe Change to:
PRIMARY KEY(customer_id, date, campaign_id)
INDEX(campaign_id)
BUT... And this is a big BUT. This rearrangement of indexing may significantly hurt other queries. We really need to see
All the big queries
EXPLAIN SELECT for each
Did you notice that the range is 10 months plus 1 day? This is because BETWEEN is 'inclusive'.
If 80% of disk is already used, you are in deep weeds. Any fixes will require more than 20% of the disk to achieve.
One thing to do (when you have enough disk space) is to shrink BIGINT (8 bytes, probably an excessive range) and INT UNSIGNED (4 bytes, 4 billion max) to smaller int types where practical.
I'm confused. These seem to contradict each other; please clarify:
customer_daily_stats ~ 1 - 100 millions rows
customer_daily_stats ~ 50.000
Some more things to help with the analysis:
innodb_buffer_pool_size
RAM size
disk footprint for tables (GB)

Optimal Database Struct

Im a data lover and created a list of possible item combinations for a widely known mobile game. There are 21.000.000 combinations (useless combos filtered out by logics).
So what i wanna do now is creating a website people can access to see what they need to get the best gear OR whats the best they can do with the gear the have right now.
My Item Database currently looks like this:
CREATE TABLE `items` (
`ID` int(8) unsigned NOT NULL,
`Item1` int(2) unsigned NOT NULL,
`Item2` int(2) unsigned NOT NULL,
`Item3` int(2) unsigned NOT NULL,
`Item4` int(2) unsigned NOT NULL,
`Item5` int(2) unsigned NOT NULL,
`Item6` int(2) unsigned NOT NULL,
`Item7` int(2) unsigned NOT NULL,
`Item8` int(2) unsigned NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB
ID range: 1 - 21.000.000
Every Item is known by its number e.g. 11. First number describes the category and second number the item of this category. For example 34 means Item3 --> 4. Its saved like this as i also have images to show on the website later using this number as identification (34.png).
The Stats Database looks like this right now:
CREATE TABLE stats (
Stat1 FLOAT UNSIGNED NOT NULL,
Stat2 FLOAT UNSIGNED NOT NULL,
Stat3 FLOAT UNSIGNED NOT NULL,
Stat4 FLOAT UNSIGNED NOT NULL,
Stat5 FLOAT UNSIGNED NOT NULL,
Stat6 FLOAT UNSIGNED NOT NULL,
Stat7 FLOAT UNSIGNED NOT NULL,
Stat8 FLOAT UNSIGNED NOT NULL,
ID1 INT UNSIGNED,
ID2 INT UNSIGNED,
ID3 INT UNSIGNED,
ID4 INT UNSIGNED,
ID5 INT UNSIGNED,
ID6 INT UNSIGNED,
ID7 INT UNSIGNED,
ID8 INT UNSIGNED
) ENGINE = InnoDB;
Where Stat* stands for stuff like Attack, Defense, Health, etc. and ID* for the ID of the Item Database. Some Combinations have the same stat combinations over all 8 possible stats, so i grouped them together to save some entries (dunno if that was smart yet). For example one Stat combination can have ID1, ID2 and ID3 filled and another combination just has ID1 (the max is 8 IDs tho, i calced it).
Right now im displaying a huge table sortable by every Stat, and its working fine.
What i want in the future tho is to let the user search for items or exclude certain items from the list. I know i can do this with some join and where-clauses (where items.ID == stats.ID1 OR items.ID == stats.ID2 etc.), but i wonder if my current structure is the smartest solution for this? I try to get the best performance as im running this on my old Pi 2.
When you have very large data-sets that only have a small number of matches, the best performance is often to use a subquery in the FROM or WHERE clause.
SELECT SP.TerritoryID,
SP.BusinessEntityID,
SP.Bonus,
TerritorySummary.AverageBonus
FROM (SELECT TerritoryID,
AVG(Bonus) AS AverageBonus
FROM Sales.SalesPerson
GROUP BY TerritoryID) AS TerritorySummary
INNER JOIN
Sales.SalesPerson AS SP
ON SP.TerritoryID = TerritorySummary.TerritoryID
Copied from here
This effectively creates a virtual table of only those rows that match, then runs the join on the virtual table - a lot like selecting the matching rows into a tmp table, then joining on the tmp table. Running a join on the entire table, although you might think it would be OK, often comes out terrible.
You may also find using a subquery in the WHERE clause works
... where items.id in (select id1 from stats union select id2 from stats)
Or select your matching stats IDs into a tmp table, then indexing the tmp table.
It all depends quite a lot on what your other selection logic is.
It also sounds like you should get some indexes on the stats table. If you're not updating it a lot, then indexing every ID can work OK. Just make sure the unfilled stats IDs have the value NULL

Add an effective index on a huge table

I have a MySQL database table with more than 34M rows (and growing).
CREATE TABLE `sensordata` (
`userID` varchar(45) DEFAULT NULL,
`instrumentID` varchar(10) DEFAULT NULL,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
`data` varchar(200) DEFAULT NULL,
`dataState` varchar(45) NOT NULL DEFAULT 'Original',
`gps` varchar(45) DEFAULT NULL,
`location` varchar(45) DEFAULT NULL,
`speed` varchar(20) NOT NULL DEFAULT '0',
`unitID` varchar(5) NOT NULL DEFAULT '1',
`parameterID` varchar(5) NOT NULL DEFAULT '1',
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
`status` varchar(7) DEFAULT 'Offline',
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
I access this table from multiple threads (at least 400 threads) every minute to insert data into the table.
As the table was growing, it was getting slower to read and write the data. One SELECT query used to take about 25 seconds, then I added a unique index
UNIQUE INDEX idx_userInsDate ( userID,instrumentID,utcDateTime)
This reduced the read time from 25 seconds to some milliseconds but it has increased the insert time as it has to update the index for each record.
Also If I run a SELECT query from multiple threads as the same time the queries take too long to return the data.
This is an example query
Select dateTime from sensordata WHERE userID = 'someUserID' AND instrumentID = 'someInstrumentID' AND dateTime between 'startDate' AND 'endDate' order by dateTime asc;
Can someone help me, to improve the table schema or add an effective index to improve the performance, please.
Thank you in advance
A PRIMARY KEY is a UNIQUE key. Toss the redundant UNIQUE(id) !
Is id referenced by any other tables? If not, then get rid of it all together. Instead have just
PRIMARY KEY ( userID, instrumentID, utcDateTime)
That is, if that triple is guaranteed to be unique. You mentioned DST -- use the datatype TIMESTAMP instead of DATETIME. Doing that, you can convert to DATETIME if needed, thereby eliminating one of the columns.
That one index (the PK) takes virtually no space since it is "clustered" with the data in InnoDB.
Your table is awfully fat with all those VARCHARs. For example, status can be reduced to a 1-byte ENUM. Others can be normalized. Things like speed can be either a 4-byte FLOAT or some smaller DECIMAL, depending on how much range and precision you need.
With 34M wide rows, you have probably recently exceeded the cacheability of the RAM you have. By making the row narrower, you will postpone that overflow.
Why attack the indexes? Every UNIQUE (including PRIMARY) index is checked before allowing the row to be inserted. By getting it down to 1 index, that minimizes the cost there. (InnoDB really needs a PRIMARY KEY.)
INT is 4 bytes. Do you have a billion instruments? Maybe instrumentID could be SMALLINT UNSIGNED, which is 2 bytes, with a max of 64K? Think about all the other IDs.
You have 400 INSERTs/minute, correct? That is not bad. If you get to 400/second, we need to have a different talk.
("Fill factor" is not tunable in MySQL because it does not make much difference.)
How much RAM do you have? What is the setting for innodb_buffer_pool_size? Optimal is somewhere around 70% of available RAM.
Let's see your main queries; there may be other issues to address.
It's not the indexes at fault here. It's your data types. As the size of the data on disk grows, the speed of all operations decrease. Indexes can certainly help speed up selects - provided your data is properly structured - but it appears that it isnt
CREATE TABLE `sensordata` (
`userID` int, /* shouldn't this have a foreign key constraint? */
`instrumentID` int,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
/* what exactly are you putting here? Are you sure it's not causing any reduncy? */
`data` varchar(200) DEFAULT NULL,
/* your states will be a finite number of elements. They can be represented by constants in your code or a set of values in a related table */
`dataState` int,
/* what's this? Sounds like what you are saving in location */
`gps` varchar(45) DEFAULT NULL,
`location` point,
`speed` float,
`unitID` int DEFAULT '1',
/* as above */
`parameterID` int NOT NULL DEFAULT '1',
/* are you sure this is different from data? */
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
/* as above and isn't this the same as */
`status` int,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
1st of all: Avoid varchars for indexes and especially IDs. Each character position in the varchar generates an own index-entry internally!
2nd: Your select uses dateTime, your index is set to utcDateTime. It will only take userID and instrumentID and ignore the utcDateTime-Part.
Advise: Change your data types for the ids and change your index to match the query (dateTime, not utcDateTime)
Using an index decreases your performance on inserts, unluckily, there is nothing such as a fill factor for indexes in mysql right now. So the best thing you can do is try the indexes to be as small as possible.
Another approach on heavily loaded databases with random access would be: write to an unindexed table, read from an indexed one. At a given time, build the indexes and swap the tables (may require a third table for the index creation while leaving the other ones untouched in between).

SQL - Create table for store weight and height

I am looking on internet for 3 hours, but i dont find any solution.
I would like to create an SQL database via script. I storing user weight and height in a table, but i do not know which is the best type for it.
SQL code
CREATE TABLE details (
ID int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
user_id int(11) NOT NULL REFERENCES user(ID),
weight decimal(5,2) UNSIGNED NULL,
height tinyint UNSIGNED NULL
);
I want store height in cm [100 - 220]
and weight in Kg [30.0 - 150.0] example. weight -> ##.#
Edit:
This is MySQL server.
If this it mySQL you can make them both decimals like you did with weight:
CREATE TABLE details (
ID int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
user_id int(11) NOT NULL REFERENCES user(ID),
height FLOAT,
weight FLOAT
);
You store the number in the data base, the fact that it is kg or meters or whatever is something you have to remember, or deal with after you get the data from the database.
If you want a way to remember what unit you are storing into the data base you can do this:
CREATE TABLE details (
ID int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
user_id int(11) NOT NULL REFERENCES user(ID),
height_cm FLOAT,
weight_kg FLOAT
);

Table localization - One column for a table

I have got only one column for a table when i create two localized tables. Code as bellow.
-- Month
CREATE TABLE `month` (
`id` INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
);
-- Month Localized
CREATE TABLE `month_loc` (
`month_id' INT NOT NULL,
`name` VARCHAR(200) NOT NULL,
`description` VARCHAR(500) NOT NULL,
`lang_id` INT NOT NULL
);
month_loc.month_id is the foreign key.
month table holds only the primary key. Other all fields should be localized. Is this table structure correct ?
Thanks.
If correct implies a certain degree of normalization, and the content of your columns name and description vary per month_id, lang_id (which would be the combined primary key of month_loc), then yes, your design has reached the 3rd grade of normlization.