Optimal Database Struct - mysql

Im a data lover and created a list of possible item combinations for a widely known mobile game. There are 21.000.000 combinations (useless combos filtered out by logics).
So what i wanna do now is creating a website people can access to see what they need to get the best gear OR whats the best they can do with the gear the have right now.
My Item Database currently looks like this:
CREATE TABLE `items` (
`ID` int(8) unsigned NOT NULL,
`Item1` int(2) unsigned NOT NULL,
`Item2` int(2) unsigned NOT NULL,
`Item3` int(2) unsigned NOT NULL,
`Item4` int(2) unsigned NOT NULL,
`Item5` int(2) unsigned NOT NULL,
`Item6` int(2) unsigned NOT NULL,
`Item7` int(2) unsigned NOT NULL,
`Item8` int(2) unsigned NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB
ID range: 1 - 21.000.000
Every Item is known by its number e.g. 11. First number describes the category and second number the item of this category. For example 34 means Item3 --> 4. Its saved like this as i also have images to show on the website later using this number as identification (34.png).
The Stats Database looks like this right now:
CREATE TABLE stats (
Stat1 FLOAT UNSIGNED NOT NULL,
Stat2 FLOAT UNSIGNED NOT NULL,
Stat3 FLOAT UNSIGNED NOT NULL,
Stat4 FLOAT UNSIGNED NOT NULL,
Stat5 FLOAT UNSIGNED NOT NULL,
Stat6 FLOAT UNSIGNED NOT NULL,
Stat7 FLOAT UNSIGNED NOT NULL,
Stat8 FLOAT UNSIGNED NOT NULL,
ID1 INT UNSIGNED,
ID2 INT UNSIGNED,
ID3 INT UNSIGNED,
ID4 INT UNSIGNED,
ID5 INT UNSIGNED,
ID6 INT UNSIGNED,
ID7 INT UNSIGNED,
ID8 INT UNSIGNED
) ENGINE = InnoDB;
Where Stat* stands for stuff like Attack, Defense, Health, etc. and ID* for the ID of the Item Database. Some Combinations have the same stat combinations over all 8 possible stats, so i grouped them together to save some entries (dunno if that was smart yet). For example one Stat combination can have ID1, ID2 and ID3 filled and another combination just has ID1 (the max is 8 IDs tho, i calced it).
Right now im displaying a huge table sortable by every Stat, and its working fine.
What i want in the future tho is to let the user search for items or exclude certain items from the list. I know i can do this with some join and where-clauses (where items.ID == stats.ID1 OR items.ID == stats.ID2 etc.), but i wonder if my current structure is the smartest solution for this? I try to get the best performance as im running this on my old Pi 2.

When you have very large data-sets that only have a small number of matches, the best performance is often to use a subquery in the FROM or WHERE clause.
SELECT SP.TerritoryID,
SP.BusinessEntityID,
SP.Bonus,
TerritorySummary.AverageBonus
FROM (SELECT TerritoryID,
AVG(Bonus) AS AverageBonus
FROM Sales.SalesPerson
GROUP BY TerritoryID) AS TerritorySummary
INNER JOIN
Sales.SalesPerson AS SP
ON SP.TerritoryID = TerritorySummary.TerritoryID
Copied from here
This effectively creates a virtual table of only those rows that match, then runs the join on the virtual table - a lot like selecting the matching rows into a tmp table, then joining on the tmp table. Running a join on the entire table, although you might think it would be OK, often comes out terrible.
You may also find using a subquery in the WHERE clause works
... where items.id in (select id1 from stats union select id2 from stats)
Or select your matching stats IDs into a tmp table, then indexing the tmp table.
It all depends quite a lot on what your other selection logic is.
It also sounds like you should get some indexes on the stats table. If you're not updating it a lot, then indexing every ID can work OK. Just make sure the unfilled stats IDs have the value NULL

Related

if attribute x = "something" do this

So I have the following tables running, but I'm having a problem on a specific situation.
I have a network of soap dispensers, that I want to keep track of their current soap level. I'm counting the number of pumps (3 mililiters each) and doing greatest(full_capacity - number_pumps * 3, 0) as seen on the View table.
But my problem is: there is table maintenance, and one of the "descriptions" may be "refill". What I wanted was for when a maintenance_description = "refill" for the number_pumps in table records be set to 0 for that exact dispenser. Is is possible? I read about triggers, but couldn't really understand how to do this.
As a pratical example, lets say I have soap dispenser id 1 with a max capacity of 1000ml, I then count 300 pumps, so I know I have 100ml left. I then do a refill and want the number of pumps to get set to 0. Otherwise in the next use it will say I have 97ml available, when in reality I have 997ml because I already made a refill.
Thank you very much in advance.
create table dispenser(
id_dispenser int not null auto_increment,
localization_disp varchar(20) not null,
full_capacity int not null,
primary key (id_dispenser));
create table records(
time_stamp DATETIME DEFAULT CURRENT_TIMESTAMP not null,
dispenser_id int not null,
number_pumps int not null,
battery_level float not null,
primary key (dispenser_id,time_stamp));
create table maintenance(
maintenance_id int not null auto_increment,
maintenance_date DATETIME DEFAULT CURRENT_TIMESTAMP not null,
employee_id int not null,
maintenance_description varchar(20) not null,
dispenser_id int not null,
primary key (maintenance_id));
CREATE VIEW left_capacity
AS
SELECT max(time_stamp) AS calendar,
id_dispenser AS dispenser,
full_capacity AS capacity,
greatest(full_capacity - number_pumps * 3, 0) AS available
FROM records r
INNER JOIN dispenser d
ON d.id_disp = r.id_dispenser
GROUP by id_dispenser;
If I understand correctly you want a view with the amount remaining. This would be the number pumps since the last refill, subject to your formula.
MySQL has had tricky issues with subqueries in views. I think the following is view-safe for MySQL:
select d.*,
(d.full_capacity -
(select count(*) * 3
from records r
where r.id_dispenser = d.id_dispenser and
r.time_stamp > (select max(m.maintenance_date)
from maintenance m
where m.id_dispenser = r.id_dispenser and
m.maintenance_description = 'refill'
)
)
) as available
from dispenser d;

One table or Two tables to store Forum's threads and Posts?

I am building forum site with ASP.net and DB is MYSQL.
as you all know users can start a thread and others can reply to it.
So here is the table that I implemented.
CREATE TABLE `a_post` (
`post_id_pk` int(10) unsigned NOT NULL AUTO_INCREMENT,
`is_thread` tinyint(1) NOT NULL,
`parent_thread_id` int(10) unsigned DEFAULT NULL,
`title` varchar(100) DEFAULT NULL,
`short_description` varchar(200) DEFAULT NULL,
`description` text NOT NULL,
`category_id_fk` tinyint(3) unsigned DEFAULT NULL,
`user_id_fk` int(10) unsigned NOT NULL,
......
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
So I am using one table to store both threads and posts. This approach is working fine. Normally thread count is smaller than posts count. Or else I can implement two tables to do is. One is to store threads and another one is to store comments with the corresponding thread Id.
What is best? One table or managing two tables?
I want an answer with performance wise. People who did forum systems. You are most welcome here.
One thread can have multiple posts and by that you might consider having two table, one for threads and one for posts.
also remember that, duo to microsoft naming conventions table field should be pascal case. so by that i suggest to change you schema to below:
Table [Threads]
ID int
CreatedBy int
CreatedOn DateTime
Title nvarchar
CategoryID int
...
Table [ThreadPosts]
ID int
ThreadID int
CreatedBy int
CreatedOn DateTime
Body nvarchar
...
by doing this you avoid data duplication, because for example when someone send a post on a thread you don't have a category_id to fill, and the field left empty. that might be the performance issue on larger systems.
consider you want to get all the posts of a thread. its so easy with two tables :
var query = from thread in db.Threads
join posts in db.ThreadPosts on thread.ID equals posts.ThreadID
where thread.ID == threadID
select new ThreadFullModel(){
Thread = thread,
Posts = posts
};

Running SQL queries with JOINs on large datasets

Im new to using MySQL.
Im trying to run an inner join query, between a database of 80,000 (this is table B) records against a 40GB data set with approx 600million records (this is table A)
Is Mysql suitable for running this sort of query?
Whay sort of time should I expect it to take?
This is the code I ied is below. However it failed as my dbs connection failed at 60000 secs.
set net_read_timeout = 36000;
INSERT
INTO C
SELECT A.id, A.link_id, link_ref, network,
date_1, time_per,
veh_cls, data_source, N, av_jt
from A
inner join B
on A.link_id = B.link_id;
Im starting to look into ways to cutting down the 40GB table size to a temp table, to try and make the query more manageabe. But I keep getting
Error Code: 1206. The total number of locks exceeds the lock table size 646.953 sec
Am I on the right track?
cheers!
my code for splitting the database is:
LOCK TABLES TFM_830_car WRITE, tfm READ;
INSERT
INTO D
SELECT A.id, A.link_id, A.time_per, A.av_jt
from A
where A.time_per = 34 and A.veh_cls = 1;
UNLOCK TABLES;
Perhaps my table indices are in correct all I have is a simple primary key
CREATE Table A
(
id int unsigned Not Null auto_increment,
link_id varchar(255) not Null,
link_ref int not Null,
network int not Null,
date_1 varchar(255) not Null,
#date_2 time default Null,
time_per int not null,
veh_cls int not null,
data_source int not null,
N int not null,
av_jt int not null,
sum_squ_jt int not null,
Primary Key (id)
);
Drop table if exists B;
CREATE Table B
(
id int unsigned Not Null auto_increment,
TOID varchar(255) not Null,
link_id varchar(255) not Null,
ABnode varchar(255) not Null,
#date_2 time not Null,
Primary Key (id)
);
In terms of the schema, it is just these two two tables (A and B) loaded underneath a database
I believe that answer has already been given in this post: The total number of locks exceeds the lock table size
ie. use a table lock to avoid InnoDB default row by row lock mode
thanks foryour help.
Indexing seems to have solved the problem. I managed to reduce the query time from 700secs to aprox 0.2secs per record by indexing on:
A.link_id
i.e. from
from A
inner join B
on A.link_id = B.link_id;
found this really usefull post. v helpfull for a newbe like myself
http://hackmysql.com/case4
code used to index was:
CREATE INDEX linkid_index ON A(link_id);

MySQL Create a function table?

I am trying to design the layout of the table to work best in the following situation.
I have a product that is sold based on age. The age determines if that product exists for this person and the minimum and maximum one can buy.
Right now i have designed the table as follows:
CREATE TABLE `tblProductsVsAge` (
`id` int(255) AUTO_INCREMENT NOT NULL,
`product_id` bigint(255) NOT NULL,
`age_min` int(255) NOT NULL,
`age_max` int(255) NOT NULL,
`quantity_min` decimal(8) NOT NULL,
`quantity_max` decimal(8) NOT NULL,
/* Keys */
PRIMARY KEY (`id`)
) ENGINE = InnoDB;
this is functional and it work, but i feel as if its not the best optimized structure.
any idea?
i forgot to mention a product can have many ranges. for example age min 25 age max 35 and the quantity for this would be 12 and 28, for the same product ID we might have age 36 to 60, quantity from 3 to 8.
Use tinyint unsigned for age_max and age_min since none of the ages in the question pass 255 (highest unsigned tinyint).
Use smallint unsigned for quantity_max and quantity_min if those values > 255 and <= 65535 (highest unsigned smallint).
Use mediumint unsigned for quantity_max and quantity_min if those values > 65535 and <= 16777215 (highest unsigned mediumint).
Use int unsigned for quantity_max and quantity_min if those values > 16777215 and <= 4294967295 (highest unsigned int). (Sometimes, you gotta Think Big !!!)
My recommendation:
CREATE TABLE `tblProductsVsAge` (
`product_id` int NOT NULL,
`age_min` tinyint unsigned NOT NULL,
`age_max` tinyint unsigned NOT NULL,
`quantity_min` smallint unsigned NOT NULL,
`quantity_max` smallint unsigned NOT NULL,
/* Keys */
PRIMARY KEY (`product_id`, `age_min`)
) ENGINE = InnoDB;
Here is something to consider if the table already has data: You could ask mysql to recommend column defintions for this table.
Simply run this query:
SELECT * FROM tblProductsVsAge PROCEDURE ANALYSE();
The directive PROCEDURE ANALYSE() will cause mysql not to display the data but to examine the values from each column and come up with its own recommendation. Sometimes, the recommendation is too granular. For example, if age_min is in the teenage range, it may recommend ENUM('13','14','15','16','17',18','19') instead of tinyint. After PROCEDURE ANALYSE() is done, you still make the final call on the column definitions.
CREATE TABLE `tblProductsVsAge` (
`product_id` int NOT NULL,
`age_min` smallint NOT NULL,
`age_max` smallint NOT NULL,
`quantity_min` smallint NOT NULL,
`quantity_max` smallint NOT NULL,
/* Keys */
PRIMARY KEY (`product_id`, `age_min`)
) ENGINE = InnoDB;
Changes to your structure:
id is probably not needed (unless you really need it), but then if you need product_id to be bigint then id should have the same type - after all this table can get more rows than your products table,
I changed type od product_id to int, I don't think you will have more than 2147483647 products,
age and quantity are smallints, which can have a maximum value of 32767 (use mediumint or int if it's not enough). decimal is intended for when you need exact precision or numbers bigger than bigint,
index on (id, age_min) to make faster searches for given product_id and for searches like product_id = {some_id} AND min_age > {user_age}
(255) in int/bigint definition doesn't make it 255 digits long - it's only a hint for string representation.
MySQL manual on numeric types: http://dev.mysql.com/doc/refman/5.5/en/numeric-types.html

Is this a good solution to ensure data integrity in this specific situation?

I'm working on an application which tracks prices for certain items.
Each price has a reference to an item, a business that sells that item, and the location the item is being sold at. Now, normally, this would do just fine:
CREATE TABLE `price` (
`priceId` INT UNSIGNED NOT NULL AUTO_INCREMENT, -- PK
`businessId` INT UNSIGNED NOT NULL,
`itemId` INT UNSIGNED NOT NULL,
`locationId` INT UNSIGNED NOT NULL,
`figure` DECIMAL(19,2) UNSIGNED NOT NULL,
-- ...
)
But I have the following problem:
The application logic is such that one item at one business at one location can have multiple prices (at this point it's not really important why), and one of those prices can be an official price - an item doesn't have to have an official price, but if it does, there can be only one.
The question is; how to model this to ensure data integrity?
My initial idea was to create an additional table:
CREATE TABLE `official_price` (
`priceId` INT UNSIGNED NOT NULL -- PK + FK (references price.priceId),
-- ...
)
This table would hold priceId:s for prices that are official, and the PK/UNIQUE constraint would take care of the 'one-or-none' constraint.
This seems like a workable solution, but I'm still wondering if there's a better way to handle this situation?
You can use this dirty hack:
add a field is_official to price table, null as a value is possible in it
create an unique composite index priceId + is_official
for the official prices put 1 to is_official
for not official left it to be null
You could make the price table hold only official prices (with the figure possibly null), put a unique constraint on (businessId, itemId, locationId), and add another table of auxiliary prices referencing priceId.