I am currently developing a database storage solution for product inventory information for the company I work for. I am using MySql, and I am having a hard time coming up with an efficient, feasible format for the data storage.
As it works right now, we have ~25000 products to keep track of. For each product, there are about 20 different categories that we need to track information for(quantity available, price, etc..). This report is downloaded and updated every 3-4 days, and it is stored and updated in excel right now.
My problem is that the only solution I have come up with so far is to create separate tables for each one of the categories mentioned above, using foreign keys based off of the product skus, and cascading to update each respective table. However, this method would require that every table add 24000 rows each time the program is run, given that each product needs updated for the date it was run. The problem with this is that the data will be store for around a year, so the tables will grow an extensive amount. My research for other database formats has yielded some examples, but none on the scale of this. They are geared towards adding maybe 100 rows a day.
Does anybody know or have any ideas of a suitable way to set up this kind of database, or is the method I described above suitable and within the limitations of the MySql tables?
Thanks,
Mike
25,000 rows is nothing to MySQL or a flat file for that case. Do not initially worry about data volume. I've worked on many retail database schemas and products are usually defined by either a static or arbitrary-length set of attributes. Your data quantity ends of not being that far off either way.
Static:
create table products (
product_id integer primary key auto_increment
, product_name varchar(255) -- or whatever
, attribute1_id -- FK
, attribute2_id -- FK
, ...
, attributeX_id -- FK
);
create table attributes (
attribute_id integer primary key -- whatever
, attribute_type -- Category?
, attribute_value varchar(255)
);
Or, you obviously:
create table products (
product_id integer primary key auto_increment
, product_name varchar(255) -- or whatever
);
create table product_attributes (
product_id integer
, attribute_id integer
, -- other stuff you want like date of assignment
, primary key (product_id , attribute_id)
);
create table attributes (
attribute_id integer primary key -- whatever
, attribute_type -- Category?
, attribute_value varchar(255)
);
I would not hesitate to shove a few hundred million records into a basic structure like either.
Related
My client has given me about 14k urls of various products and he wants me to store all the price changes of that product per day. I think it will require an immense amount of db storage and a lot of optimization. I've never done this before. I'm using mysql DB. Should I store all these price changes per product in a JSON column or as separate row? Looking for tips regarding this. Thanks!
JSON columns are not as efficient as normal SQL columns and should be reserved for when you're not sure what data you're going to have. You're pretty sure what data you're going to have.
This is a pretty straightforward two table schema. One table for the product, and one for its price changes.
create table product (
id integer primary key auto_increment,
name varchar,
url varchar unique,
...any other information about the product you might want to store...
index(url)
);
By giving it a primary key it shields you from the URL changing, and it reduces the amount that must be stored in tables that refer to it. They only have to store the integer primary key, not the whole URL. The URL is indexed for faster searches.
Now that you have a product table other tables can refer to it. Like a table of price changes.
create table product_price_changes (
product_id integer references product(id),
price numeric(9,2) not null,
change_time datetime not null,
index(change_time)
);
This table stores when the price for a product changes, and what that price is. This is how you attach lists of data to things in SQL. The change_time is indexed for faster searches.
A simple join lets you efficiently see all the changes to a particular product in order.
select price, change_time
from product_price_changes ppc
join product prod on ppc.product_id = prod.id
where prod.url = ?
order by change_time
Sorry, not sure if question title is reflects the real question, but here goes:
I designing system which have standard orders table but with additional previous and next columns.
The question is which approach for foreign keys is better
Here I have basic table with following columns (previous, next) which are self referencing foreign keys. The problem with this table is that the first placed order doesn't have previous and next fields, so they left out empty, so if I have say 10 000 records 30% of them have those columns empty that's 3000 rows which is quite a lot I think, and also I expect numbers to grow. so in a let's say a year time period it can come to 30000 rows with empty columns, and I am not sure if it's ok.
The solution I've have came with is to main table with other 2 tables which have foreign keys to that table. In this case those 2 additional tables are identifying tables and nothing more, and there's no longer rows with empty columns.
So the question is which solution is better when considering query speed, table optimization, and common good practices, or maybe there's one even better that I don't know? (P.s. I am using mysql with InnoDB engine).
If your aim is to do order sets, you could simply add a new table for that, and just have a single column as a foreign key to that table in the order table.
The orders could also include a rank column to indicate in which order orders belonging to the same set come.
create table order_sets (
id not null auto_increment,
-- customer related data, etc...
primary key(id)
);
create table orders (
id int not null auto_increment,
name varchar,
quantity int,
set_id foreign key (order_set),
set_rank int,
primary key(id)
);
Then inserting a new order means updating the rank of all other orders which come after in the same set, if any.
Likewise, for grouping queries, things are way easier than having to follow prev and next links. I'm pretty sure you will need these queries, and the performances will be much better that way.
I think I know what the problem is, but I need some help going in the correct direction.
I've got a tables 'products', and I've also got several temp_product tables for various suppliers.
The goal here is to update several fields in the products table from the appropriate temp_product table. My product suppliers give me a CSV file with all of their inventory data. Originally I was just looping through the CSV and updating line by line, but this takes forever, so now I load it into a temporary table using LOAD DATA LOCAL INFILE.
The problem I am having, is the UPDATE queries take forever to run, and most of the time MySQL just completely crashes. I am hoping I can show you my table structure, and somebody could help me out with what kind of key/index setup would work best?
I've tried 2 different update query variations, but neither one is working.
UPDATE product AS p, temp_product AS t
SET p.quantity = t.quantity,
p.ean = t.inventory,
p.cost = t.price,
p.date_modified = NOW()
WHERE p.sku = t.sku
-AND-
UPDATE temp_product AS t
INNER JOIN product AS p ON p.sku = t.sku
SET p.quantity = t.quantity,
p.ean = t.inventory,
p.cost = t.price,
p.date_modified = NOW()
Here is the structure to my tables in question:
temp_product
sku varchar(15) PRI
status varchar(2)
statusid int(11)
quantity int(11)
inventory varchar(15)
ETA varchar(25)
ETA_Note varchar(255)
price double(10,2)
product
product_id int(11) PRI
model varchar(64)
sku varchar(64)
upc varchar(50)
ean varchar(50)
mpn varchar(64)
location varchar(128)
quantity int(4)
price decimal(15,4)
cost decimal(15,4)
status tinyint(1)
date_added datetime
date_modified datetime
I have a feeling I could get this to work correctly if I had keys/indices set up correctly. The only thing I have set up now is the Primary Key, but those don't match up across all the tables. I'm pretty new to all this, so any help would be appreciated.
To make things even more complicated, I'm not sure if some of my suppliers use the same SKUs, so I would like to update the product table WHERE sku = sku and location = 'suppliername'.
Thanks for the help!
EDIT: Slimmed down the problem a little bit, originally had a product and supplier_product table to update, once I get the product table working I can probably take it from there.
First of all, could you run SHOW CREATE TABLE product; and SHOW CREATE TABLE temp_product; and paste the results? Also, how exactly large is your product table? (select count(1) from products can help)
Regarding the keys: you need at least to add sku key to your product table.
If sku is supposed to be a unique field, then you can do it with the following command:
ALTER TABLE product ADD UNIQUE KEY sku(sku);
If sku is NOT a unique field, then you can still add it as a key like that:
ALTER TABLE product ADD KEY sku(sku);
but in that case, this mean that for one record with a partcular sku from the temp_product table, you will update more than one record in your product table.
Regarding the table size: even if the table is large (say several million rows), but it's OK to run queries that take a lot of time (for example, if you are the only one using this database) then after you have added the key, either of the variants should in principle work and take less time than what it takes now. Otherwise, you would be better off with doing the update in batches (e.g. 100, 500 or 1000 records at a time) and preferably with some script that might even wait a little between the updates. This is especially recommended if your database is a master database that replicates to slaves.
My questions comes first, then I'll describe the whole situation and current solution:
Questions.
1. Why could mySQL make enormously lots of continous read|write (300-1000 megaBytes) disk operations?
2. Is DB structure optimal (need advice otherwise)?
3. Do UniqueKey could slow down DB?
4. What could be better solution for the situation?
5. At the end vServer is getting down and I got mail with ~'ETIMEDOUT: Connection timed out - connect (2)'; So maybe issue is not in DB structure but it is some misconfiguration?
Situation.
Users on the end devices are playing and when gameover comes they are storing game records in central DB. Users could see highscores table sorted by hignscore.
I cant say that there are a lot of users. Lets assume that 1 user per 1 min.
Solution.
LAMP.
Since there are several similar games that users are playing there are several similar tables+views pairs in DB. (~25 Tables+25 Views total). Most of tables contain ~30 000 records. 3 of them contain up to 150 000 records.
In order to store users uniquely: 1user-1record I made a unique key UNIQUE INDEX userid (userid, gamename, gametype, recordvalue).
Since user should see sorted values (highscores) I made a view for a table that shows what is needed. So the external php script is working with view rather then with table.
CREATE TABLE supergameN (
id INT(11) NOT NULL AUTO_INCREMENT,
userid VARCHAR(255) NOT NULL,
username VARCHAR(50) NOT NULL,
gamename VARCHAR(100) NOT NULL,
gametype VARCHAR(100) NOT NULL,
description VARCHAR(100) NULL DEFAULT 'empty',
recordvalue INT(11) NOT NULL,
PRIMARY KEY (id),
UNIQUE INDEX userid (userid, gamename, gametype, recordvalue)
)
CREATE VIEW supergameN_view AS
SELECT
id,
userid,
username,
gamename,
gametype,
description,
recordvalue
FROM supergameN
ORDER BY gametype, recordvalue DESC
Thanks in advance. Alex.
Maybe not the solution but something I noticed:
Leave out the recordvalue from the unique key, since otherwise you would allow several records to exist for each userid-gamename-gametype combination, as long asd they have different recordvalues!
By using
UNIQUE INDEX userid (userid, gamename, gametype)
You ensure that per game and user you only ever store one result.
And, some further remarks/questions:
Do you really need two columns to identify the game?
What is kept in description: is it user or game related?
Maybe you could normalize a bit by having just a gameid column in your main table and (assuming that description refers to the game) a separate table games with columns gameid, gamename,gametypeand description. And then, of course, there would be no need to keep id anymore, instead you would have the combination of (userid,gameid) as your primary key.
I'm building this tool for classifying data. Basically I will be regularly receiving rows of data in a flat-file that look like this:
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
And I have a list of categories to break these rows up into, for example:
Original Cat1 Cat2 Cat3 Cat4 Cat5
---------------------------------------
a:b:c:d:e a b c d e
As of right this second, there category names are known, as well as number of categories to break the data down by. But this might change over time (for instance, categories added/removed...total number of categories changed).
Okay so I'm not really looking for help on how to parse the rows or get data into a db or anything...I know how to do all that, and have the core script mostly written already, to handle parsing rows of values and separating into variable amount of categories.
Mostly I'm looking for advice on how to structure my database to store this stuff. So I've been thinking about it, and this is what I came up with:
Table: Generated
generated_id int - unique id for each row generated
generated_timestamp datetime - timestamp of when row was generated
last_updated datetime - timestamp of when row last updated
generated_method varchar(6) - method in which row was generated (manual or auto)
original_string varchar (255) - the original string
Table: Categories
category_id int - unique id for category
category_name varchar(20) - name of category
Table: Category_Values
category_map_id int - unique id for each value (not sure if I actually need this)
category_id int - id value to link to table Categories
generated_id int - id value to link to table Generated
category_value varchar (255) - value for the category
Basically the idea is when I parse a row, I will insert a new entry into table Generated, as well as X entries in table Category_Values, where X is however many categories there currently are. And the category names are stored in another table Categories.
What my script will immediately do is process rows of raw values and output the generated category values to a new file to be sent somewhere. But then I have this db I'm making to store the data generated so that I can make another script, where I can search for and list previously generated values, or update previously generated entries with new values or whatever.
Does this look like an okay database structure? Anything obvious I'm missing or potentially gimping myself on? For example, with this structure...well...I'm not a sql expert, but I think I should be able to do like
select * from Generated where original_string = '$string'
// id is put into $id
and then
select * from Category_Values where generated_id = '$id'
...and then I'll have my data to work with for search results or form to alter data...well I'm fairly certain I can even combine this into one query with a join or something but I'm not that great with sql so I don't know how to actually do that..but point is, I know I can do what I need from this db structure..but am I making this harder than it needs to be? Making some obvious noob mistake?
My suggestion:
Table: Generated
id unsigned int autoincrement primary key
generated_timestamp timestamp
last_updated timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP
generated_method ENUM('manual','auto')
original_string varchar (255)
Table: Categories
id unsigned int autoincrement primary key
category_name varchar(20)
Table: Category_Values
id unsigned int autoincrement primary key
category_id int
generated_id int
category_value varchar (255) - value for the category
FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id
FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id
Links
Timestamps: http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
Create table syntax: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Enums: http://dev.mysql.com/doc/refman/5.1/en/enum.html
I think this solution is perfect for what you want to do. The Categories list is now flexible so that you can add new categories or retire old ones (I would recommend thinking long and hard about it before agreeing to delete a category - would you orphan record or remove them too, etc.)
Basically, I'm saying you are right on target. The structure is simple but it will work well for you. Great job (and great job giving exactly the right amount of information in the question).