Mysql lookup table (Mysql performance issue) - mysql

I've made a look up table for all my tables. What I'm trying to do is what it is supposed to do, correspond the ID's to the matched values.
I use mysql work bench, server 5.7(newest version doesn't work for me).
I have thousands of values and 6 tables. For a simple explanation lets say I have 3 tables.
------------------table1------------------
| t1ID | Person | Purchase| Code |
| 1 | Jon | 50 | 111 | /* Code = t3ID */
| 2 | Dan | 100 | 222 | /* Purchase = Buyer in table2 */
| 3 | Pete | 200 | 333 |
(Many more)
------------------table2------------------
| t2ID | Buyer | Date | Barcode |
| 1 | 200 | 1/1/20 | ABC111 | /* Buyer = Purchase in table1 */
| 2 | 100 | 2/1/20 | ABC222 | /* Barcode = Item_ID in table3 */
| 3 | 50 | 3/1/20 | ABC333 |
(Many more)
------------------table3---------
| t3ID | Item | Item_ID | /* t3ID = Code*/
| 111 | Laser | ABC111 | /* Item_ID = Barcode */
| 222 | Phones | ABC222 |
| 333 | Tables | ABC333 |
(Many more)
I get this... I need this...
-------------Lookup_Table------------- -------------Lookup_Table----------
| lookID | t1ID | t2ID | t3ID | | lookID | t1ID | t2ID | t3ID |
| 1 | 1 | 1 | 111 | | 1 | 1 | 3 | 333 |
| 2 | 2 | 2 | 222 | | 2 | 2 | 2 | 222 |
| 3 | 3 | 3 | 333 | | 3 | 3 | 1 | 111 |
This table is connected by foreign keys and these values were added manually here because the original tables came from CSV files.
My problem is performance or maybe I'm doing it wrong on mysql. When I query by the table ID its works perfectly fine but the values will not match since I added the tableID's afterwards, there are repeated values in certain fields and all the values on the tables are random and they are only connected by those specific values shown above.
When I query select or any other one like "on table1.Purchase = table2.Buyer" or "where", to make the comparison and add them to the table properly, the localhost server crashes. It loses connection to mysql server. Also, if its directly compared between the tables, it works but takes over 5 minutes to do that comparison.
If I limit the rows between 0-10000 is fine, above that or if I don't limit it just crashes. I can't limit since I have over 20000 rows, for now.
example is
update Lookup_Table lk inner join table1 t1 on t1.t1ID = lk.t1ID
inner join table2 t2 on t2.Buyer = t1.Purchase
set lk.t2ID = t2.t2ID;
Same thing between the other tables. Any idea if I'm doing it wrong or if there's another way of doing this? I've tried so many different things and no luck.

Related

SQL (MYSQL, Postgres) Lookup/report table

I'm basically making a lookup table for my three tables but its not doing what I want. Those 3 tables were created by loading 3 different csv files.
What I'm trying to do is inserting the ID's from those tables into the lookup one.
This is what I keep getting:
---------------Lookuptable---------------
|lookup_ID|Table1_ID|Table2_ID|Table3_ID|
| 1 | 1 | | |
| 2 | 2 | | |
| 3 | 3 | | |
| | | 1 | |
| | | 2 | |
| | | 3 | |
| | | | 1 |
| | | | 2 |
| | | | 3 |
What I need is:
---------------Lookuptable---------------
|lookup_ID|Table1_ID|Table2_ID|Table3_ID |
| 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | 2 |
| 3 | 3 | 3 | 3 |
I kind of get why this is happening, it inserts one row bellow every time with single inserts like
insert into Lookuptable(Table1_ID) select T1id from Table1;
and the others...
But I've tried nested ones too like
insert into Lookuptable(Table1_ID, Table2_ID, Table3_ID)
select Table1.T1id, Table2.T2id, Table2.T2id from Table1, Table2, Table3;
but still doesn't work. In fact this one just crashes the Mysql server and has an endless query on Postgres. I've tried other nested examples but none worked.
I'm also using Foreign Keys which work when I manually input a new value, but since the other tables come from loaded CSV files I have to input the values already there manually.
I'm really not sure what to do.
If I understand correctly what you want something like this should work
https://www.db-fiddle.com/f/k6CGsVXazSqJDfwKkdr6S7/1
SET #i:=0,#j:=0,#h:=0;
INSERT INTO Lookuptable
SELECT NULL,t1.ID,t2.ID,t3.ID FROM
( SELECT #i:=(#i+1) AS temp_id,ID FROM Table1 ) t1
INNER JOIN
( SELECT #j:=(#j+1) AS temp_id ,ID FROM Table2 ) t2 ON t1.temp_id=t2.temp_id
INNER JOIN
( SELECT #h:=(#h+1) AS temp_id ,ID FROM Table3 ) t3 ON t2.temp_id=t3.temp_id;

Precalculate numbers of records for each possible combination

I have a mySQL database table containing cellphones information like this:
ID Brand Model Price Type Size
==== ===== ===== ===== ====== ====
1 Apple A71 3128 A 40
2 Samsung B7C 3128 B 20
3 Apple ZX5 3128 A 30
4 Huawei Q32 2574 B 40
5 Apple A21 2574 A 25
6 Apple A71 3369 A 30
7 Samsung A71 7413 C 40
Now I want to create another table, that would contain counts for every possible combination of the parameters.
Params Count
============================================== =======
ALL 1000000
Brand(Apple) 20000
Brand(Apple,Samsung) 40000
Brand(Apple),Model(A71) 7100
Brand(Apple),Type(A) 6000
Brand(Apple),Model(A71,B7C),Type(A,B) 7
Model(A71) 12514
Model(A71,B7C) 26584
Model(A71),Type(A) 6521
Model(A71),Type(A,B) 8958
Model(A71),Type(A,B),Size(40) 85
And so on for every possible combination. I was thinking about creating a stored procedure (that i would execute periodically), that would perform queries with every existing condition like that, but I am a little stuck on how exactly should it look like. Or is there a better way how to do this?
Edit: the reason why I want to store information like this is to be able to show number of results in filter in client application, like in the picture.
I would like to create index on the Params column to be able to get the Count number for given hash instantly, improving performance.
I also tried querying and caching the values dynamically, but I want to try this approach as well, so I can compare which one is more effective.
This is how I am calculating the counts now:
SELECT COUNT(*) FROM products;
SELECT COUNT(*) FROM products WHERE Brand IN ('Apple');
SELECT COUNT(*) FROM products WHERE Brand IN ('Apple', 'Samsung');
SELECT COUNT(*) FROM products WHERE Brand IN ('Apple') AND Model IN ('A71');
etc.
You can use a ROLLUP for this.
SELECT
model, type, size, COUNT(*)
FROM mytab
GROUP BY 1, 2, 3
WITH ROLLUP
With your sample data, we get the following:
| model | type | size | COUNT(*) |
| ----- | ---- | ---- | -------- |
| A21 | A | 25 | 1 |
| A21 | A | | 1 |
| A21 | | | 1 |
| A71 | A | 30 | 1 |
| A71 | A | 40 | 1 |
| A71 | A | | 2 |
| A71 | C | 40 | 1 |
| A71 | C | | 1 |
| A71 | | | 3 |
| B7C | B | 20 | 1 |
| B7C | B | | 1 |
| B7C | | | 1 |
| Q32 | B | 40 | 1 |
| Q32 | B | | 1 |
| Q32 | | | 1 |
| ZX5 | A | 30 | 1 |
| ZX5 | A | | 1 |
| ZX5 | | | 1 |
| | | | 7 |
The subtotals are present in the rows with null values in different columns, and the total is the last row where all group by columns are null.

Migrate data from a link table into target table

I have two tables. One (let's call it a) is currently a link table with data in like this
| c_id | t_id |
|-------|-------|
| 1 | 8 |
| 1 | 9 |
| 2 | 8 |
| 3 | 8 |
| 4 | 9 |
and another (t) with data like this
| id | code | value |
|-------|-------|-------|
| 1 | AB | 0.9 |
| 2 | BC | 0 |
| 3 | IM | 0 |
| 4 | MC | 0 |
| 5 | VI | 0 |
| 6 | BC | 0.9 |
| 7 | MC | 2.5 |
| 8 | VI | 2.5 |
| 9 | BC | 2.5 |
t_id in table a is a foreign key mapping onto id in table t, which is an auto incremented ID.
Due to functionality changes, I now want the data from a to replicate the linked row in t and add the required c_id (and then table a to be dropped) so you get something like this;
| id | c_id | code | value |
|-------|-------|-------|-------|
...
| 25 | 1 | VI | 2.5 |
| 26 | 2 | VI | 2.5 |
| 27 | 3 | VI | 2.5 |
| 28 | 1 | BC | 2.5 |
| 29 | 4 | BC | 2.5 |
which will enable me to change the value column per c_id, rather than globally. The new rows can safely be added to the end of the table - or perhaps it would be better to have a new table with this information in.
Is there a query that can do this? I hope I don't have to do it by hand!
Assuming I'm understanding correctly, since you mentioned modifying tables, this is a one-time procedure.
You won't be able to add the rows to the end of either of the existing tables, since you have different column requirements. You'll have to either make a new table or modify the existing one. I chose the former, and then you can populate it using CREATE TABLE ... SELECT ... syntax:
CREATE TABLE new_t (id SERIAL, c_id INT, code VARCHAR(2), value FLOAT);
INSERT INTO new_t (c_id, code, value)
SELECT a.c_id, t.code, t.value FROM a INNER JOIN t ON (t.id = a.t_id);
http://sqlfiddle.com/#!9/c6765/2

Work around the 61 table JOIN limit in MySQL by nesting subqueries within each other

I figured out that you can get around the 61 table join limit in MySQL by using subqueries.
https://stackoverflow.com/a/20134402/2843690
I'm trying to figure out how to easily use this in a program I'm working on to get a detailed product list from Magento (but I think the answer to this question could apply to a lot of situations where eav is involved). The tables that need to be joined look something like this:
catalog_product_entity
+-----------+----------------+
| entity_id | entity_type_id |
+-----------+----------------+
| 1 | 4 |
| 2 | 4 |
| 3 | 4 |
| 4 | 4 |
| 5 | 4 |
| 6 | 4 |
| 7 | 4 |
| 8 | 4 |
| 9 | 4 |
+-----------+----------------+
catalog_product_entity_int
+----------+----------------+--------------+-----------+-------+
| value_id | entity_type_id | attribute_id | entity_id | value |
+----------+----------------+--------------+-----------+-------+
| 1 | 4 | 2 | 1 | 245 |
| 2 | 4 | 3 | 1 | 250 |
| 3 | 4 | 4 | 1 | 254 |
| 4 | 4 | 2 | 2 | 245 |
| 5 | 4 | 3 | 2 | 249 |
| 6 | 4 | 4 | 2 | 253 |
| 7 | 4 | 2 | 3 | 247 |
| 8 | 4 | 3 | 3 | 250 |
| 9 | 4 | 4 | 3 | 254 |
+----------+----------------+--------------+-----------+-------+
eav_attribute
+--------------+----------------+----------------+--------------+
| attribute_id | entity_type_id | attribute_code | backend_type |
+--------------+----------------+----------------+--------------+
| 1 | 4 | name | varchar |
| 2 | 4 | brand | int |
| 3 | 4 | color | int |
| 4 | 4 | size | int |
| 5 | 4 | price | decimal |
| 6 | 4 | cost | decimal |
| 7 | 4 | created_at | datetime |
| 8 | 3 | name | varchar |
| 9 | 3 | description | text |
+--------------+----------------+----------------+--------------+
eav_attribute_option
+-----------+--------------+
| option_id | attribute_id |
+-----------+--------------+
| 245 | 2 |
| 246 | 2 |
| 247 | 2 |
| 248 | 3 |
| 249 | 3 |
| 250 | 3 |
| 251 | 4 |
| 252 | 4 |
| 253 | 4 |
| 254 | 4 |
+-----------+--------------+
eav_attribute_option_value
+----------+-----------+-------------------+
| value_id | option_id | value |
+----------+-----------+-------------------+
| 15 | 245 | Fruit of the Loom |
| 16 | 246 | Hanes |
| 17 | 247 | Jockey |
| 18 | 248 | White |
| 19 | 249 | Black |
| 20 | 250 | Gray |
| 21 | 251 | Small |
| 22 | 252 | Medium |
| 23 | 253 | Large |
| 24 | 254 | Extra Large |
+----------+-----------+-------------------+
The program that I'm writing generated sql queries that looked something like this:
SELECT cpe.entity_id
, brand_int.value as brand_int, brand.value as brand
, color_int.value as color_int, color.value as color
, size_int.value as size_int, size.value as size
FROM catalog_product_entity as cpe
LEFT JOIN catalog_product_entity_int as brand_int
ON (cpe.entity_id = brand_int.entity_id
AND brand_int.attribute_id = 2)
LEFT JOIN eav_attribute_option as brand_option
ON (brand_option.attribute_id = 2
AND brand_int.value = brand_option.option_id)
LEFT JOIN eav_attribute_option_value as brand
ON (brand_option.option_id = brand.option_id)
LEFT JOIN catalog_product_entity_int as color_int
ON (cpe.entity_id = color_int.entity_id
AND color_int.attribute_id = 3)
LEFT JOIN eav_attribute_option as color_option
ON (color_option.attribute_id = 3
AND color_int.value = color_option.option_id)
LEFT JOIN eav_attribute_option_value as color
ON (color_option.option_id = color.option_id)
LEFT JOIN catalog_product_entity_int as size_int
ON (cpe.entity_id = size_int.entity_id
AND size_int.attribute_id = 4)
LEFT JOIN eav_attribute_option as size_option
ON (size_option.attribute_id = 4
AND size_int.value = size_option.option_id)
LEFT JOIN eav_attribute_option_value as size
ON (size_option.option_id = size.option_id)
;
It was relatively easy to write the code to generate the query, and the query was fairly easy to understand; however, it's pretty easy to hit the 61 table join limit, which I did with my real-life data. I believe the math says 21 integer-type attributes would go over the limit, and that is before I even start adding varchar, text, and decimal attributes.
So the solution I came up with was to use subqueries to overcome the 61 table limit.
One way to do it is to group the joins in subqueries of 61 joins. And then all of the groups would be joined. I think I can figure out what the sql queries should look like, but it seems difficult to write the code to generate the queries. There is a further (albeit theoretical) problem in that one could again violate the 61 table limit if there were enough attributes. In other words, if I have 62 groups of 61 tables, there will be a MySQL error. Obviously, one could get around this by then grouping the groups of groups into 61. But that just makes the code even more difficult to write and understand.
I think the solution I want is to nest subqueries within subqueries such that each subquery is using a single join of 2 tables (or one table and one subquery). Intuitively, it seems like the code would be easier to write for this kind of query. Unfortunately, thinking about what these queries should look like is making my brain hurt. That's why I need help.
What would such a MySQL query look like?
You're right that joining too many attributes through an EAV design is likely to exceed the limit of joins. Even before that, there's probably a practical limit of joins because the cost of so many joins gets higher and higher geometrically. How bad this is depends on your server's capacity, but it's likely to be quite a bit lower than 61.
So querying an EAV data model to produce a result as if it were stored in a conventional relational model (one column per attribute) is problematic.
Solution: don't do it with a join per attribute, which means you can't expect to produce the result in a conventional row-per-entity format purely with SQL.
I'm not intimately familiar with the Magento schema, but I can infer from your query that something like this might work:
SELECT cpe.entity_id
, o.value AS option
, v.value AS option_value
FROM catalog_product_entity AS cpe
INNER JOIN catalog_product_entity_int AS i
ON cpe.entity_id = i.entity_id AND i.attribute_id IN (2,3,4)
INNER JOIN eav_attribute_option AS o
ON i.value = o.option_id AND i.attribute_id = o.attribute_id
INNER JOIN eav_attribute_option_value AS v
ON v.option_id = o.option_id;
The IN(2,3,4,...) predicate is where you specify multiple attributes. There's no need to add more joins to get more attributes. They're simply returned as rows rather than columns.
This means you have to write application code to fetch all the rows of this result set and map them into fields of a single object.
From comments by #Axel, is sounds like Magento provides helper functions to do this consuming of a result set and mapping it into an object.

Select Distinct Set Common to Subset From Join Table

Given a join table for m-2-m relationship between booth and user
+-----------+------------------+
| booth_id | user_id |
+-----------+------------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 5 |
| 1 | 9 |
| 2 | 1 |
| 2 | 2 |
| 2 | 5 |
| 2 | 10 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 4 |
| 3 | 6 |
| 3 | 11 |
+-----------+------------------+
How can I get a distinct set of booth records that are common between a subset of user ids? For example, if I am given user_id values of 1,2,3, I expect the result set to include only booth with id 3 since it is the only common booth in the join table above between all user_id's provided.
I'm hoping I'm missing a keyword in MySQL to accompish this. The furthest I've come so far is using ... user_id = all (1,2,3) but this is always returning an empty result set (I believe I understand why it is though).
The SQL query for this will be:
select booth_id from table1 where [user_id]
in (1,2,3) group by booth_id having count(booth_id) =
(select count(distinct([user_id])) from table1 where [user_id] in (1,2,3))
If this could help you creating the MySQL query.