Updating a large MySQL table without any key/index information - mysql

I think I know what the problem is, but I need some help going in the correct direction.
I've got a tables 'products', and I've also got several temp_product tables for various suppliers.
The goal here is to update several fields in the products table from the appropriate temp_product table. My product suppliers give me a CSV file with all of their inventory data. Originally I was just looping through the CSV and updating line by line, but this takes forever, so now I load it into a temporary table using LOAD DATA LOCAL INFILE.
The problem I am having, is the UPDATE queries take forever to run, and most of the time MySQL just completely crashes. I am hoping I can show you my table structure, and somebody could help me out with what kind of key/index setup would work best?
I've tried 2 different update query variations, but neither one is working.
UPDATE product AS p, temp_product AS t
SET p.quantity = t.quantity,
p.ean = t.inventory,
p.cost = t.price,
p.date_modified = NOW()
WHERE p.sku = t.sku
-AND-
UPDATE temp_product AS t
INNER JOIN product AS p ON p.sku = t.sku
SET p.quantity = t.quantity,
p.ean = t.inventory,
p.cost = t.price,
p.date_modified = NOW()
Here is the structure to my tables in question:
temp_product
sku varchar(15) PRI
status varchar(2)
statusid int(11)
quantity int(11)
inventory varchar(15)
ETA varchar(25)
ETA_Note varchar(255)
price double(10,2)
product
product_id int(11) PRI
model varchar(64)
sku varchar(64)
upc varchar(50)
ean varchar(50)
mpn varchar(64)
location varchar(128)
quantity int(4)
price decimal(15,4)
cost decimal(15,4)
status tinyint(1)
date_added datetime
date_modified datetime
I have a feeling I could get this to work correctly if I had keys/indices set up correctly. The only thing I have set up now is the Primary Key, but those don't match up across all the tables. I'm pretty new to all this, so any help would be appreciated.
To make things even more complicated, I'm not sure if some of my suppliers use the same SKUs, so I would like to update the product table WHERE sku = sku and location = 'suppliername'.
Thanks for the help!
EDIT: Slimmed down the problem a little bit, originally had a product and supplier_product table to update, once I get the product table working I can probably take it from there.

First of all, could you run SHOW CREATE TABLE product; and SHOW CREATE TABLE temp_product; and paste the results? Also, how exactly large is your product table? (select count(1) from products can help)
Regarding the keys: you need at least to add sku key to your product table.
If sku is supposed to be a unique field, then you can do it with the following command:
ALTER TABLE product ADD UNIQUE KEY sku(sku);
If sku is NOT a unique field, then you can still add it as a key like that:
ALTER TABLE product ADD KEY sku(sku);
but in that case, this mean that for one record with a partcular sku from the temp_product table, you will update more than one record in your product table.
Regarding the table size: even if the table is large (say several million rows), but it's OK to run queries that take a lot of time (for example, if you are the only one using this database) then after you have added the key, either of the variants should in principle work and take less time than what it takes now. Otherwise, you would be better off with doing the update in batches (e.g. 100, 500 or 1000 records at a time) and preferably with some script that might even wait a little between the updates. This is especially recommended if your database is a master database that replicates to slaves.

Related

MySQL: Query Assistance Needed

I'm going to keep it brief here for convenience's sake. I'm new to SQL coding, so please excuse me if I say something weird.
I did not manage to find a solid solution to it (at least one that I would truly understand), which is precisely why I'm posting here as a last resort at this point.
The table code:
create table companies (
company_id mediumint not null auto_increment,
Name varchar(40) not null,
Address varchar(40),
FoundingDate date,
primary key (company_id)
);
create table employees (
Employee_id mediumint not null auto_increment,
Name varchar (40),
Surname varchar(40),
primary key (Employee_id)
);
create table accounts (
Account_id mediumint not null auto_increment,
Account_number varchar(10) not null,
CompanyID int(10),
Date_of_creation date,
NET_value int(30),
VAT int(3),
Total_value int(40),
EmployeeID int(10) not null,
Description varchar(40),
primary key (Account_number)
);
Table values are random strings and numbers until I figure this out.
My issue is that I'm stuck at forming correct SQL queries, namely:
Query all accounts with their designated companies. I need it to show 'NULL' value if an account has no associated company.
Query that can list all accounts whose date is less than 2018-03-16 or those without a date.
Query that will print the description of the 'Accounts' table in one column and the number of characters in that description in a different column.
Query that lists all employees whose names end with '-gh' and that have names greater than 5 characters in length.
Query that will list the top total sum amount.
Query that will list all accounts that have '02' in them (i.e. 3/02/05).
If you can answer at least one of these queries and if you can explain how you got to the solution in a simplistic manner, well... I'm afraid I have nothing to offer but honest gratitude! ^^'
Welcome to the community, but as Jerry commented, you should really try to show SOMETHING that you have tried just to show what you THINK is needed. Also, don't just add comments to respond, but edit your original post with additional details / data as people ask questions.
To try and advance you forward though, I will point out two specific links that should help you out. The first one is a link for the basics on querying explaining the
select [fields] from [what table] join [other tables] where [what is your criteria] -- etc. Some Basics on querying
The next give some very good clarification on JOIN conditions of (INNER) JOIN -- which means required record match in BOTH tables being joined, and FULL OUTER JOINS, LEFT JOINs, etc.
After reviewing those, if you STILL have questions, please edit your original question, post some samples of what you THINK is working and let us know (or comment back to a specific answer), and we in the forum can follow-up with you.
HINT, your first query wanting NULL you should get from the visual link via LEFT JOIN.
A visual representation and samples on querying

What's the best way to store all these data in db?

My client has given me about 14k urls of various products and he wants me to store all the price changes of that product per day. I think it will require an immense amount of db storage and a lot of optimization. I've never done this before. I'm using mysql DB. Should I store all these price changes per product in a JSON column or as separate row? Looking for tips regarding this. Thanks!
JSON columns are not as efficient as normal SQL columns and should be reserved for when you're not sure what data you're going to have. You're pretty sure what data you're going to have.
This is a pretty straightforward two table schema. One table for the product, and one for its price changes.
create table product (
id integer primary key auto_increment,
name varchar,
url varchar unique,
...any other information about the product you might want to store...
index(url)
);
By giving it a primary key it shields you from the URL changing, and it reduces the amount that must be stored in tables that refer to it. They only have to store the integer primary key, not the whole URL. The URL is indexed for faster searches.
Now that you have a product table other tables can refer to it. Like a table of price changes.
create table product_price_changes (
product_id integer references product(id),
price numeric(9,2) not null,
change_time datetime not null,
index(change_time)
);
This table stores when the price for a product changes, and what that price is. This is how you attach lists of data to things in SQL. The change_time is indexed for faster searches.
A simple join lets you efficiently see all the changes to a particular product in order.
select price, change_time
from product_price_changes ppc
join product prod on ppc.product_id = prod.id
where prod.url = ?
order by change_time

How can I get all business data AS WELL as if current user is following them?

In mysql how can I write a query that will fetch ALL business data, and at the same time (or not if it is better another way) check if user is following that business? I have the following relationship table to determine if a user is following a business (status=1 would mean that person is following):
CREATE TABLE IF NOT EXISTS `Relationship_User_Follows_Business` (
`user_id` int(10) unsigned NOT NULL,
`business_id` int(10) unsigned NOT NULL,
`status` tinyint(3) unsigned NOT NULL DEFAULT '0' COMMENT '1=following, 0=not following'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
ALTER TABLE `Relationship_User_Follows_Business`
ADD UNIQUE KEY `unique_user_business_id` (`user_id`,`business_id`);
Assume business table just holds data on different businesses like name, phone number, etc. I would want to return all of the business data in my query (Business.*). I want to append the status (0 or 1) to the end of each business row to determine if the user is following that business. I have tried the following query but it does not work because it is narrowing the results to only show a business if there is a relationship row. I wish to show ALL businesses regardless if a relationship row exists or not because I only create the relationship row if a user follows:
SELECT Business.*, Relationship_User_Follows_Business.status FROM Business, Relationship_User_Follows_Business WHERE 104=Relationship_User_Follows_Business.user_id AND Business.id=Relationship_User_Follows_Business.business_id
Note that I am using 104 as a test user id. The user id would normally be dependent on user, not a static 104.
You are looking for a LEFT JOIN and not an INNER JOIN which keeps all the records from the master table and all the matching rows from the details table . Also, avoid using implicit join syntax(comma separated) and use the proper syntax of a join :
SELECT Business.*, Relationship_User_Follows_Business.status
FROM Business
LEFT JOIN Relationship_User_Follows_Business
ON Business.id = Relationship_User_Follows_Business.business_id
AND Relationship_User_Follows_Business.user_id = 104

Maintaining order data integrity with constant edits

hypothetically, if i were with a system that keeps track of products and order information across multiple tables (orders, order_items, product)
orders
id INT(11)
shipping_name VARCHAR(255)
shipping_street VARCHAR(255)
shipping_city VARCHAR(255)
[etc]
order_details
id INT(11)
order_id INT(11)
product_id INT(11)
products
id INT(11)
name VARCHAR(255)
description VARCHAR(255)
price DECIMAL(8,2)
structure is very simple order has multiple order_items, order_items has one product.
the problem is that when someone edits a product, those edits modify the data of previous orders. if an employee were to go back and look at that information later on, they may not have the same information that the customer received at the time the order was placed.
What would be best practice?
should i add a 'display_item' field to the products table, and on edit/delete set display to 0 and add edited product as new row?
should i duplicate the name, description, and price in order_details?
I think this is one of those cases where database normalization "breaks".
Some possible solutions:
Keep a copy of the product attributes for each order. This is expensive in terms of storage, but it makes it easier to track down the product data stored in the order.
Create a log of attributes that can be changed in time. Product attributes can change over time, so a log which stores the modification date can help you filter out the product attributes to the moment the order was made.
Proposal for option 1
Create a copy of the products table and create a relation (one-to-one) to the order_details table for each order and order detail.
Proposal for option 2
Split the products table in two: product_general_info and product_attributes. Product general info is meant to be stable through time (a product's general info will not change), as any modification to the data in this table will propagate to the whole orders set. Product attributes must have a date or timestamp value to define when the attributes changed. Then you can query the database and return the last record that is before or on the order date.

MySql database formatting

I am currently developing a database storage solution for product inventory information for the company I work for. I am using MySql, and I am having a hard time coming up with an efficient, feasible format for the data storage.
As it works right now, we have ~25000 products to keep track of. For each product, there are about 20 different categories that we need to track information for(quantity available, price, etc..). This report is downloaded and updated every 3-4 days, and it is stored and updated in excel right now.
My problem is that the only solution I have come up with so far is to create separate tables for each one of the categories mentioned above, using foreign keys based off of the product skus, and cascading to update each respective table. However, this method would require that every table add 24000 rows each time the program is run, given that each product needs updated for the date it was run. The problem with this is that the data will be store for around a year, so the tables will grow an extensive amount. My research for other database formats has yielded some examples, but none on the scale of this. They are geared towards adding maybe 100 rows a day.
Does anybody know or have any ideas of a suitable way to set up this kind of database, or is the method I described above suitable and within the limitations of the MySql tables?
Thanks,
Mike
25,000 rows is nothing to MySQL or a flat file for that case. Do not initially worry about data volume. I've worked on many retail database schemas and products are usually defined by either a static or arbitrary-length set of attributes. Your data quantity ends of not being that far off either way.
Static:
create table products (
product_id integer primary key auto_increment
, product_name varchar(255) -- or whatever
, attribute1_id -- FK
, attribute2_id -- FK
, ...
, attributeX_id -- FK
);
create table attributes (
attribute_id integer primary key -- whatever
, attribute_type -- Category?
, attribute_value varchar(255)
);
Or, you obviously:
create table products (
product_id integer primary key auto_increment
, product_name varchar(255) -- or whatever
);
create table product_attributes (
product_id integer
, attribute_id integer
, -- other stuff you want like date of assignment
, primary key (product_id , attribute_id)
);
create table attributes (
attribute_id integer primary key -- whatever
, attribute_type -- Category?
, attribute_value varchar(255)
);
I would not hesitate to shove a few hundred million records into a basic structure like either.