mysql views performance vs extra column - mysql

my question is about selecting the best method to perform a job. i'll present the goal and my various solutions.
i have a list of items and a list of categories. each can belong to a number of categories.
items (id, name, ...other fields...)
categories (id, name, ...... )
category_items (category_id, item_id)
the items list is very large and is updated every 10 minutes (using cron). the categories list is fixed.
on my page, i'm showing a large list of items and i have category filters. the whole filtering is done on client side javascript. The reason is that the items that are currently available are limited to +- 1000, so all the data (items+categories) will be loaded together.
this page is about to be viewed many times, so performance is an issue here. I have several ideas, both will result in a good performance. in all of them, the complete list of categories will be sent. the items however...
running a single select using join and group_concat. something like this:
SELECT i.*, GROUP_CONCAT(ci.category_id SEPARATOR ",") AS category_list
FROM items AS i
LEFT JOIN category_items AS ci ON (ci.item_id = i.id)
WHERE ... GROUP BY i.id ORDER BY ...
creating a view with the above
storing the GROUP_CONCAT result as an additional column. this will only be updated every several minutes under cron.
indexing is done correctly so all methods will work relatively fast. join is a heavy operation, so my question is about (2), (3):
is the view updated only on every CRUD or is it calculated on every select? if it is only updated on CRUD, it should be about the same as storing the column.
take in mind that the items table will grow, and only latest rows will be selected.

Solution 4. Have a MEMORY type table, which will be updated with results of your query from solution 1, by the same cron script, that updates items table.
Other than that: 1. and 2. are equivalent. MySQL's views are not materialised, so querying the view will actually run the SELECT from point 1.

Creating a view is just saving a query, (2) will still run the query and the join. (3) of course will save the time at the expense of space.
The answer, therefore is a question: Do you and your app value time or space?
Also, instead of using cron to update the cache field (your GROUP_CONCAT) you could use a trigger on the category_items table;

Related

How to select Master and Detail for Joiner Transformation containing large records in IICS?

I am using Informatica Intelligent Cloud Services (IICS) Intelligent Structure model to parse the JSON file that I have.The file is located on S3 bucket,and it contains 3 groups. 2 Groups contains lots of records (~100,000) and 3rd group contains (~10,000 records). According to Intelligent structure model, largest group contains PK, which I can use to join the other group, but the issue is for Master and Detail which group should I select ? Usually, group with lower records should be selected but in my case, lower records contains foreign key ? Is there a work around for this issue ?
I am new to IICS so how to resolve the issue ?
Any help will be appreciated. Thanks in advance!
Rule is, select table with samll rowcount should be master because during execution, the master source is cached into the memory for joining purpose.
Having said that, can you use 3rd group with less rows as master for both joins like below. If its normal join, logic remains same but perf will improve if you choose master with less rows and less granularity.
Sq_gr1(d)\
Sq_gr3-jnr1(m)->|jnr2----->
Sq_gr2(d)------>/
Outer join will take time equivalent to count of rows.

Pager for query with many relations and fields

I have a quite complex view with two queries (a view in a view), one select users with related data and another one select orders with related data. Both of them have some filters, but now I have an issue and I am looking for proper and just decent solution, with good performance because I have a lot of data and relationships in the queries.
Assume I have:
Query 1 - Select user data, some left joins to other tables, conditions depends on provided parameters.
Query 2 - Select orders depends on users from Query 1, many joins, conditions depends on parameters.
I display data from two queries in one view, users, their data, orders, and some orders data and now I want to implement pager, but it has to work and display proper number of users depends on filters form Query 1 and Query 2. So there is an issue that I can't really limit from any query cuz another one has filters as well so maybe those users maybe aren't really selected to display depends on other query filters.
So I guess there are two ways, one is to put those queries in loop and collect data until I get proper number of results depends on query.
Another way is to merge those two queries into one, but there an issue that I get many rows per user, so I can't set any page limit and get results only for specific number of users, like for example 30. Because results will be like user 1 => order 1, user 1 => order 2, so is there any way to get specific number of unique results depends on user id or something.
Let me know if you have any questions.
Sample data will make more sense. I am unable to understand the whole requirement here in your question. will you be able to create some sample data and share with us ? if you are dealing with a lot of data, avoid loops as that will just make performance worse.

Implement filters with counters

What I want to achieve:
I am developing website with a catalog of products.
This is normalized model (simplified) of entities which are related to my question:
So some product features exist (like size and type in this example), which all have predefined sets of values (e.g. sizes 1, 2 and 3 exist, and type may be 1, 2 or 3 (these sets do not have to be equal, just example.)).
Relationship between Product and each of features is "many-to-many" - different values of one feature do not exclude each other.
My task is to build form which will allow user to filter search results, based on features of products. Example screenshot:
Multiple checked values of one feature are mixed using "AND" logic, so if I have sizes One and Three checked, I need all products, which have both sizes (+ may have any other sizes, that doesn't matter, but selected ones must be present).
Number near each value of feature represents amount of products, which is returned if user checks this value right now. So it is effectively a number of products satisfying filter "current active filter + this one value applied".
When user checks/unchecks any value, counters must be updated considering new "current filter".
Problem:
Real use case is: ~200k products, ~6 features with ~5-15 values each.
My COUNT queries, (especially with decent number of selected options) are too slow, and to render the form I need as many of these counts as there are values of all filters - in total that gives unacceptable response time.
What I have tried:
Query to retrieve results:
select * from products p, product_size ps
where p.id = ps.product_id
and (ps.size_id IN (1, 2, 3, 5))
group by p.id
having count(p.id) = 4;
(this is to select products which have sizes 1, 2, 3 and 5 at the same time).
It completes in ~0.360 sec on 120k products, almost same time with COUNT wrapped around it. And this query does not allow more than one feature (but I could place values of all features in one table).
Another query to retrieve the same set:
SELECT ps1.product_id
FROM product_size AS ps1, (SELECT id FROM size AS s1 WHERE id IN (1, 2, 3, 5)) AS t
WHERE ps1.size_id = t.id
GROUP BY ps1.product_id
HAVING COUNT(ps1.size_id) = (SELECT COUNT(id) FROM (SELECT id FROM size AS s2 WHERE id IN (1, 2, 3, 5)) AS t2);
It completes in ~0.230 sec (same time when wrapped in COUNT) and does not allow multiple features too.
It is modified query I found here: https://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/ (second query in "Division with a Remainder" part).
Alternative schema:
Denormalized model, where value of each feature is a boolean column in Products table.
The query is obvious here:
select * from products
where `size_1` = 1 and `size_2` = 1
and `size_3` = 1 and `size_5` = 1;
Weird and harder to maintain in application's code, but completes in ~0.056 sec when COUNT-ing.
None of these methods are acceptable per se because multiplied ~30 times (to populate all counters in form) that gives inadequate response time.
Caching and precomputing
Data in DB is going to be updated only few times a day (like, may be, even 2), so I could probably precompute counts for all combinations of filters when data is updated (I haven't measured necessary time to be honest), but it is anyway not going to work too - search form has fields with arbitrary values (like min/max price and text search by the product's name), which I can't precompute for.
Load counters in form dynamically
Render form, but fetch numbers through AJAX, so user would be able to see page, and then, after quite long waiting, numbers. This is my last thought, but it seems like poor quality of service for me (may be it is worse than no counters at all).
I am stuck. Any hints? May be I am not seeing some bigger picture? I would be very glad to any advice.
UPDATE: if we forget about counters, what is the effective and usually used way (query) for just retrieving results with such a filters (or what am I doing wrong)? Like "find post with all requested tags" model, that is equivalent. I suspect it can be faster than my 0.230 sec (query #2), considering small (?) amount of rows for MySQL.
You can
Create one table which will store all possible combinations (product_id <> size_id <> type_id)
Update this table when Admin will make any changes in product from backend (assuming there will be a backend management)
In frontend, for filters, use this table instead of product tables, and extract product ids once filter query is fired
Once you have list of product ids for result, you can fetch actual data by using those product Ids
I have used this before, and it worked for me, you can first make table and try running query to check response time.
Hope this helps.

MySQL Query: Return all rows with a certain value in one column when value in another column matches specific criteria

This may be a little difficult to answer given that I'm still learning to write queries and I'm not able to view the database at the moment, but I'll give it a shot.
The database I'm trying to acquire information from contains a large table (TransactionLineItems) that essentially functions as a store transaction log. This table currently contains about 5 million rows and several columns describing products which are included in each transaction (TLI_ReceiptAlias, TLI_ScanCode, TLI_Quantity and TLI_UnitPrice). This table has a foreign key which is paired with a primary key in another table (Transactions), and this table contains transaction numbers (TRN_ReceiptNumber). When I join these two tables, the query returns one row for every item we've ever sold, and each row has a receipt number. 16 rows might have the same receipt number, meaning that all of these items were sold in a single transaction. Below that might be 12 more rows, each sharing another receipt number. All transactions are broken down into multiple rows like this.
I'm attempting to build a query which returns all rows sharing a single receipt number where at least one row with that receipt number meets certain criteria in another column. For example, three separate types of gift cards all have values in the TLI_ScanCode column that begin with "740000." I want the query to return rows with values beginning with these six digits in the TLI_ScanCode column, but I would also like to return all rows which share a receipt number with any of the rows which meet the given scan code criteria. Essentially, I need the query to return all rows for every receipt number which is also paired in at least one row with a gift card-related scan code.
I attempted to use a subquery to return a column of all receipt numbers paired with gift card scan codes, using "WHERE A.TRN_ReceiptAlias IN (subquery..." to return only those rows with a receipt number which matched one of the receipt numbers returned by the subquery. This appeared to run without issue for five minutes before the server ground to a halt for another twenty while it processed the query. The query appeared to complete successfully, but given that I was working with IT to restore normal store operations during this time I failed to obtain the results of the query (apart from the associated shame and embarrassment).
I'd like to know if there is a way to write a query to obtain this information without causing the server to hang. I'm assuming that either: a) it wasn't very smart to use a subquery in this manner on such a large table, or b) I don't know enough about SQL to obtain the information I need. I'm assuming the answer is both A and B, but I'd very much like to learn how to do this the right way. Any help would be greatly appreciated. Thanks!
SELECT *
FROM a as a1
JOIN b
ON b.id = a.id
JOIN a as a2
ON a2.id = b.id
WHERE b.some_criteria = 'something';
Include an index on (b.id,b.some_criteria)
You aren't the first person, nor will you be the last to bring down your system with an inefficient query.
The most important lesson is that "Decision Support" and "Analytics" really don't co-exist with a transaction system. You really want to pull the data into a datamart or datawarehouse or some other database that isn't your transaction database, so that you don't take the business offline.
In terms of understanding why your initial query was so inefficient, you want to familiarize yourself with the EXPLAIN EXTENDED syntax that returns you plan information that should help you debug your query and work on making it perform acceptably. If you update your question with the actual explain plan output for it, that would be helpful in determining what the issue is.
Just from the outline you provided, it does sound like a self join would make sense rather than the subquery.

What is more efficient(speed/memory): a join or multiple selects

I have the following tables:
users
userId|name
items
itemId|userId|description
What I want to achieve: I want to read from the database all users and their items (an user can have multiple items). All this data I want it stored in a structure like the following:
User {
id
name
array<Item>
}
where Item is
Item {
itemId
userId
description
}
My first option would be to call a SELECT * from users, partially fill an array with users and after that for each user do a SELECT * from items where userId=wantedId and complete the array of items.
Is this approach correct, or should I use a join for this?
A reason that I don't want to use join is that I have a lot of redundant data:
userId1|name1|ItemId11|description11
userId1|name1|ItemId12|description12
userId1|name1|ItemId13|description13
userId1|name1|ItemId14|description14
userId2|name2|ItemId21|description21
userId2|name2|ItemId22|description22
userId2|name2|ItemId23|description23
userId2|name2|ItemId24|description24
by redundant I mean: userId1,name1 and userId2,name2
Is my reason justified?
LATER EDIT: I added to the title speed or memory when talking about efficiency
You're trading off network roundtrips for bytes on the wire and in RAM. Network latency is usually the bigger problem, since memory is cheap and networks have gotten faster. It gets worse as the size of the first result set grows - Google for "(n+1) query problem".
I'd prefer the JOIN. Don't write it using SELECT *; that's a bad idea in almost every case. You should spell out precisely what columns you want.
Join is the best performance way. Reduce overhead and you can use relationated indexes. You can test .. but i'm sure that joins are more fast and optimized than multiple selects
The answer is: it depends.
Multiple SELECT:
If you end up issuing lots of queries to populate the description, the you have to take into account that you'll end up with a lot of round trips to the database.
Using a JOIN:
Yes, you'll be returning more data, but you've only got one round trip.
You've mentioned that you'll partially fill an array with users. Do you know how many users you'll want to fill in advance, because in that case I would use the following (I'm using Oracle here):
select *
from item a,
(select * from
(select *
from user
order by user_id)
where rownum < 10) b
where a.user_id = b.user_id
order by a.user_id
That would return all the items for the first 10 users only (that way most of the work is done on the database itself, rather than getting all the users back, discarding all but the first ten...)