I have a quite complex view with two queries (a view in a view), one select users with related data and another one select orders with related data. Both of them have some filters, but now I have an issue and I am looking for proper and just decent solution, with good performance because I have a lot of data and relationships in the queries.
Assume I have:
Query 1 - Select user data, some left joins to other tables, conditions depends on provided parameters.
Query 2 - Select orders depends on users from Query 1, many joins, conditions depends on parameters.
I display data from two queries in one view, users, their data, orders, and some orders data and now I want to implement pager, but it has to work and display proper number of users depends on filters form Query 1 and Query 2. So there is an issue that I can't really limit from any query cuz another one has filters as well so maybe those users maybe aren't really selected to display depends on other query filters.
So I guess there are two ways, one is to put those queries in loop and collect data until I get proper number of results depends on query.
Another way is to merge those two queries into one, but there an issue that I get many rows per user, so I can't set any page limit and get results only for specific number of users, like for example 30. Because results will be like user 1 => order 1, user 1 => order 2, so is there any way to get specific number of unique results depends on user id or something.
Let me know if you have any questions.
Sample data will make more sense. I am unable to understand the whole requirement here in your question. will you be able to create some sample data and share with us ? if you are dealing with a lot of data, avoid loops as that will just make performance worse.
Related
I would like to achieve something like you see on Facebook:
- Posting status
- Comment status
- Like status (like for comments not implemented yet)
My tables structure is like this :
Posts Users Comments Likes
------- ------- -------- -------
ID ID ID ID
UserID Username PostID PostID
Content UserID UserID
Date Content
Date
So at this time when someone access to the main page the system is going to show the 10 lasts posts. My query uses LEFT JOIN on theses tables.
If for example there is 10 posts without any comments and any likes the query will return 10 records.
But for each comment or likes my query will return a new record (row) with some NULL value in the corresponding column.
At the end by simply wanting to retrieve 10 posts my query will return at least 50 rows (if each post has some comments and likes).
I was wondering if that will cause problem in the future. And I was wondering if I should better use multiple queries and parse all the results into an array like:
1. Select the 10 last posts
2. Save the IDs into array and all data into global array
3. Parse the array and make a prepared query for the comments something like:
SELECT * FROM COMMENTS WHERE PostID IN (1, 2, 3, 4, 5, 6,...)
4. Save the result into global array
5. Repeat again for the like table
I hope my explanation was clear enough :) Thank you
Doing one 50 row query reduces the overhead when communicating with the server, on the other hand it adds processing after the rows are retrieved.
It really depends on the overall solution.
However, unless the application is performance critical with the server being the bottleneck, i would go with 10 result sets - one per row, probably using some class/widget/object to display the post on the page.
I'm not an expert, if I understand correctly your option are:
A) the single mega query that will return a lot of NULL's and repeated values.
[Note: By "all" I mean, all you are interested in]
B) Three queries: One for all posts, one for all comments, and one for all likes (all joined with the users table), and then you can process them into objects or structs or dictionaries with whatever language you are using to query the database.
I would go with the second because It is easier, and the increase in order of magnitude seems benign, and probably even more flexible design wise.
What I would prefer NOT to do is one query per post. That would probably become a problem sooner than later. At least much sooner than A or B.
This may be a little difficult to answer given that I'm still learning to write queries and I'm not able to view the database at the moment, but I'll give it a shot.
The database I'm trying to acquire information from contains a large table (TransactionLineItems) that essentially functions as a store transaction log. This table currently contains about 5 million rows and several columns describing products which are included in each transaction (TLI_ReceiptAlias, TLI_ScanCode, TLI_Quantity and TLI_UnitPrice). This table has a foreign key which is paired with a primary key in another table (Transactions), and this table contains transaction numbers (TRN_ReceiptNumber). When I join these two tables, the query returns one row for every item we've ever sold, and each row has a receipt number. 16 rows might have the same receipt number, meaning that all of these items were sold in a single transaction. Below that might be 12 more rows, each sharing another receipt number. All transactions are broken down into multiple rows like this.
I'm attempting to build a query which returns all rows sharing a single receipt number where at least one row with that receipt number meets certain criteria in another column. For example, three separate types of gift cards all have values in the TLI_ScanCode column that begin with "740000." I want the query to return rows with values beginning with these six digits in the TLI_ScanCode column, but I would also like to return all rows which share a receipt number with any of the rows which meet the given scan code criteria. Essentially, I need the query to return all rows for every receipt number which is also paired in at least one row with a gift card-related scan code.
I attempted to use a subquery to return a column of all receipt numbers paired with gift card scan codes, using "WHERE A.TRN_ReceiptAlias IN (subquery..." to return only those rows with a receipt number which matched one of the receipt numbers returned by the subquery. This appeared to run without issue for five minutes before the server ground to a halt for another twenty while it processed the query. The query appeared to complete successfully, but given that I was working with IT to restore normal store operations during this time I failed to obtain the results of the query (apart from the associated shame and embarrassment).
I'd like to know if there is a way to write a query to obtain this information without causing the server to hang. I'm assuming that either: a) it wasn't very smart to use a subquery in this manner on such a large table, or b) I don't know enough about SQL to obtain the information I need. I'm assuming the answer is both A and B, but I'd very much like to learn how to do this the right way. Any help would be greatly appreciated. Thanks!
SELECT *
FROM a as a1
JOIN b
ON b.id = a.id
JOIN a as a2
ON a2.id = b.id
WHERE b.some_criteria = 'something';
Include an index on (b.id,b.some_criteria)
You aren't the first person, nor will you be the last to bring down your system with an inefficient query.
The most important lesson is that "Decision Support" and "Analytics" really don't co-exist with a transaction system. You really want to pull the data into a datamart or datawarehouse or some other database that isn't your transaction database, so that you don't take the business offline.
In terms of understanding why your initial query was so inefficient, you want to familiarize yourself with the EXPLAIN EXTENDED syntax that returns you plan information that should help you debug your query and work on making it perform acceptably. If you update your question with the actual explain plan output for it, that would be helpful in determining what the issue is.
Just from the outline you provided, it does sound like a self join would make sense rather than the subquery.
Apologies if this is redundant, and it probably is, I gave it a look but couldn't find a question here that fell in with what I wanted to know.
Basically we have a table with about ~50000 rows, and it's expected to grow much bigger than that. We need to be able to allow admin users to add in custom data to an item based on its category, and users can just pick which fields defined by the administrators they want to add info to.
Initially I had gone with an item_categories_fields table which pairs up entries from item_fields to item_categories, so admins can add custom fields and reuse them across categories for consistency. item_fields has a relationship to item_field_values which links values with fields, which is how we handled things in .NET. The project is using CAKEPHP though, and we're just learning as we go, so it can get a bit annoying at times.
I'm however thinking of maybe just adding an item_custom_fields table that is essentially the item_id and a text field that stores XMLish formatted data. This is just for the values of the custom fields.
No problems if I want to fetch the item by its id as the required data is stored in the items table, but what if I wanted to do a search based on a custom field? Would a
SELECT * FROM item_custom_fields
WHERE custom_data LIKE '%<material>Plastic</material>%'
(user input related issues aside) be practical if I wanted to fetch items made of plastic in this case? Like how slow would that be?
Thanks.
Edit: I was afraid of that as realistically this thing will be around 400k rows for that one table at launch, thanks guys.
Any LIKE query that starts with % will not use any indexes you have on the column, so the query will scan the whole table to find the result.
The response time for that depends highly on your machine and the size of the table, but it definitely won't be efficient in any shape or form.
Your previous/existing solution (if well indexed) should be quite a bit faster.
I have a large Mysql table (approx 2 million rows). I want to run searches on it that will match possibly upto 25k rows (returned results will be limited [eg 25 per page]). What I wanted to do was to rank these results on cetain criteria and use that to order them.
The solution I have so far is to create a script go through the table and assign a score to each row based on my criteria. Each result would be given points depending on how it compared with my ideal result. I could then order by that score when executing a select, instead of caluculating on the fly.
I was then thinking that I wanted other users of the system to be able to setup their own custom scoring criteria. My first thought was to ceate a separate table that would contain the first tables row id, a users id and rank. But I was thinking that this table could get very large (2 million rows foreach user). So I am thinking of alternatives so far options I have are:
1) Use separate ranking table
2) Use user specific ranking tables
3) calculate on the fly
Anyone have any experience with a similar problem? The results will be searched in real time by users so my primary concern would be to make this part of the process as fast as possible.
Many Thanks
I prefer seperate table, as you can generate a temporary new table while re-calculating, and rename the temp table to the real table.
This also may help you to bypass locking problems later...
I'm having an issue with a certain requirement to one of my Homework Assignments. I am required to take a list of students and print out all of the students with credit hours of 12 or more. The Credit hours are stored in a separate table, and referenced through a third table
basically, a students table, a classes table with hours, and an enrolled table matching student id to Course id
I used a SUM aggregate grouped by First name from the tables and that all works great, but I don't quite understand how to filter out the people with less than 12 hours, since the SQL doesn't know how many hours each person is taking until it's done with the query.
my string looks like this
'SELECT Students.Fname, SUM(Classes.Crhrs) AS Credits
FROM Students, Classes, Enrolled
WHERE Students.ID = Enrolled.StudentID AND Classes.ID = Enrolled.CourseID
GROUP BY Students.Fname;'
It works fine and shows the grid in the Delphi Project, but I don't know where to go from here to filter the results, since each query run deletes the previous.
Since it's a homework exercise, I'm going to give a very short answer: look up the documentation for HAVING.
Beside getting the desired result directly from SQL as Martijn suggested, Delphi datasets have ways to filter data on the "client side" also. Check the Filter property and the OnFilter record.
Anyway, remember it is usually better to apply the best "filter" on the database side using the proper SQL, and then use client side "filters" only to allow for different views on an already obtained data set, without re-querying the same data, thus saving some database resources and bandwidth (as long as the data on the server didn't change meanwhile...)