I am asking this question in the hope there is a more efficient (faster) way to pull and insert data in the the tables I am working with.
The basic structure of the data table is
ID Doc_ID Field Value
1 10 Title abc
2 10 Abstract xyz
3 10 Author Bob
4 11 Publisher Bookworms
5 11 Title zzz
6 11 Abstract bbb
7 12 Title aaa
8 12 Sale No
In other words the data tables are row based, each row contain a document id and the corresponding field value. Not all documents have the same number of fields defined. Indeed books may differ radically from magazines.
The data table is 10,000,000 rows typically a document has 100 fields associated with it.
So the performance problem I am finding is pulling a report with reference to 50+ different fields, for example if I have a query list in an order_table the query could be like
select ord.number as 'Order ID', d1.value as 'Title', d2.value as 'Author' .......
from order_table ord
LEFT JOIN data_table as d1 on d1.Doc_ID=ord.Doc_ID and d1.Field='Title'
LEFT JOIN data_table as d2 on d2.Doc_ID=ord.Doc_ID and d2.Field='Author'
........
LEFT JOIN data_table as d50 on d50.Doc_ID=ord.Doc_ID and d50.Field='Qty'
Using LEFT JOINS as there is no guarantee that the field is defined for that document.
Given there may be some WHERE parameters to limit the list to items (in stock for example or below a price) it is a slow query. Indexes don't really much.
Without being able to change the data model, what is the best way to pull volumes of information out?
Related
I have a database in which I perform an audit on a set of required documents, for several locations of those documents.
So I have a table named Locations and a table named Documents, which are correlated through a 2 x 2 relationship.
Every document can have multiple versions. In my query, I want to see only the most recent version of each document, so the max(Id).
Now, every version can be 'audited' (checked) multiple times, for example 2 times each year. Each Audit/check is stored in a record, and I want to show only the most recent audit for each document, so Max(ID).
This is my Selection Query:
SELECT [~Locations].Location, [+DocuProperties].Category, [~Documents].[Document name], Max([DocuVersion].Id) AS MaxDocuID, Max([Audit].Id) AS MaxAuditID, [Audit].Conclusion
FROM ([~Documents] INNER JOIN ([~Locations] INNER JOIN ([+DocuLocation] INNER JOIN [+DocuProperties] ON [+DocuLocation].Id = [+DocuProperties].DocuLocation) ON [~Locations].Id = [+DocuLocation].Location) ON [~Locations].Id = [+DocuLocation].DocuName) INNER JOIN (DocuVersion INNER JOIN 2Audit ON [DocuVersion].Id = [Audit].DocuVersion) ON [+DocuProperties].Id = [DocuVersion].DocuLocation
GROUP BY [~Locations].Location, [+Docuproperties].Category, [~Documents].[Document name], [Audit].Conclusion
However: I do not wish to Group on Audit Conclusion, I wish to show the Audit conclusion that corresponds to the Max(Id) of that Audit.
So for every most recent Audit, I want to show the Conclusion. This conclusion I want to show for each Document, grouped byCategory and grouped byLocation.
I know I need to build a nested subquery of some form, but I just can't get any code to work.
I hope anybody can help.
The basic idea is like this:
Table 1
DocuProperties
Id Location Category
1 15 1
2 15 1
3 14 2
(every location can have multiple document properties a.k.a. objects)
Table2
DocuVersion
Id DocuProperty DocumentEndDate
1 1 01-01-2022
2 1 20-07-2023
3 2 31-07-2023 etc.
4 3 01-10-2023
(every DocuProperties can have multiple versions, I have to check If they are still valid, but also on some other criteria ).
Table 3
Audit
Id DocuVersion Conclusion
1 1 Not Valid
2 1 Not Valid
3 2 Valid
4 4 Valid
(every version can be audited multiple times. Every audit can have a different conclusion)
Which I would like to translate into the following:
LASTAudit (a.k.a. the most recent audit of the most recent version of the most recent property)
Location DocutPropertyId DocuVersionId AuditId Conclusion
15 2 2 2 Not Valid
14 3 4 4 Valid
The ID’s were easy to get right, as those were just Max(Id) functions. The problem was to get the Conclusion corresponding to that audit of that version of that object.
SELECT column_1 FROM table_1,table_2;
When I ran this on my database it returned huge number of rows with duplicate column_1 values. I could not understand why I got these results. Please explain what this query does.
it gives you a cross product from table 1 and table 2
In more layman's terms, it means that for each record in Table A, you get every record from Table B (all possible combinations).
TableA with 3 records and Table B with 3 records gives 9 total records in the result:
TableA-1/B-1
TableA-1/B-2
TableA-1/B-3
TableA-2/B-1
TableA-2/B-2
TableA-2/B-3
TableA-3/B-1
TableA-3/B-2
TableA-3/B-3
Often used as a basis for Cartesian Queries (which themselves are the means to generate, say, a list of future dates based on a recurrence schedule: give me all possible results for the next 6 months, then restrict that set to those whose factor matches my day of the week)
This is 'valid' way of cross joining two tables; it is not the preferred way though. Cross Join would be much clearer. An on condition would then be helpful to limit results,
Imagine that i have 3 friends named Jhon, Ana, Nick; then i have in the other table 2 are T-shirts a red and a yellow and i wanna know witch is from.
So in the query being tableA:Friends and tableB:Tshirts returns:
1|JHON | t-shirt_YELLOW
2|JHON | t-shirt_RED
3|ANA | t-shirt_YELLOW
4|ANA | t-shirt_RED
5|NICK | t-shirt_YELLOW
6|NICK | t-shirt_RED
As you see this join has no relational logic between friends and Tshirts so by evaluating all the posible combination generates what you call duplicates.
I have the following situation. I have a table with all info of article. I will like to compare the same column with it self. because I have multiple type of article. Single product and Master product. the only way that I have to differences it, is by SKU. for example.
ID | SKU
1 | 11111
2 | 11112
3 | 11113
4 | 11113-5
5 | 11113-8
6 | 11114
7 | 11115
8 | 11115-1-W
9 | 11115-2
10 | 11116
I only want to list or / and count only the sku that are full unique. follow th example the sku that are unique and no have variant are (ID = 1, 2, 6 and 10) I will want to create a query where if 11113 are again on the column not cout it. so in total I will be 4 unique sku and not "6 (on total)". Please let me know. if this are possible.
Assuming the length of master SKUs are 5 characters, try this:
select a.*
from mytable a
left join mytable b on b.sku like concat(a.sku, '%')
where length(a.sku) = 5
and b.sku is null
This query joins master SKUs to child ones, but filters out successful joins - leaving only solitary master SKUs.
You can do this by grouping and counting the unique rows.
First, we will need to take your table and add a new column, MasterSKU. This will be the first five characters of the SKU column. Once we have the MasterSKU, we can then GROUP BY it. This will bundle together all of the rows having the same MasterSKU. Once we are grouping we get access to aggregate functions like COUNT(). We will use that function to count the number of rows for each MasterSKU. Then, we will filter out any rows that have a COUNT() over 1. That will leave you with only the unique rows remaining.
Take that unique list and LEFT JOIN it back into your original table to grab the IDs.
SELECT ID, A.MasterSKU
FROM (
SELECT
MasterSKU = SUBSTRING(SKU,1,5),
MasterSKUCount = COUNT(*)
FROM MyTable
GROUP BY SUBSTRING(SKU,1,5)
HAVING COUNT(*) = 1
) AS A
LEFT JOIN (
SELECT
ID,
MasterSKU = SUBSTRING(SKU,1,5)
FROM MyTable
) AS B
ON A.MasterSKU = B.MasterSKU
Now one thing I noticed from you example. The original SKU column really looks like three columns in one. We have multiple values being joined with hypens.
11115-1-W
There may be a reason for it, but most likely this violates first normal form and will make the database hard to query. It's part of the reason why such a complicated query is needed. If the SKU column really represents multiple things then we may want to consider breaking it out into MasterSKU, Version, and Color or whatever each hyphen represents.
I'm trying to create a report in SSRS where I have a matrix, which has gender as the column headings and specifically defined agegroups as the rows. The report is sorted by date (ie, the records being displayed are filtered by the modifedAt value). My problem is that i wish for all of the age group categories to be displayed, even if the dataset does not return any data for that row.
So, for example, if i set the date to be a date where there are no db rows where there are Age5-16 children in - I still want to display the category name, but just have the cells related to that row to display '0'. Instead, the report just drops the whole row because, obviously the query returns no data.
Is the solution to have a separate dataset that brings back the entire list of categories and then somehow fit them together? I'm stuck here so any help is appreciated!
I can think of a few ways to do this:
DataSet level
Instead of just returning the relevant data in the underlying data in the DataSet, include all the categories you want to display in all cases.
e.g. For a database query it might be the difference between an inner and left join, i.e. going from something like:
select *
from AgeGroup
inner join MyData on ...
to:
select *
from AgeGroup
left join MyData on ...
So the report always has all the age groups to display. Where there are NULL values, just display 0.
I think this is the best option if you have control over the DataSet - you won't have to update your report at all, with luck the actual DataSet changes should be minimal, there is still only one DataSet call, and it's by far the simplest to maintain.
Hard code groups into the report
Here you include a table header row for each group you want to display, so these are always displayed in all cases.
Here you have some sort of conditional expression to display the values, e.g. For each group row it will be tailored to that group:
=Sum(IIf(Fields!AgeGroup.Value = "5-16", Fields!Amount.Value, Nothing)
This is not too flexible and will need updates as you change groups, and doesn't have as many options for layout. There is still only one DataSet call, so that is a plus.
Subreports
You can have a parent DataSet that displays one row for each age group, then embed a subreport in each row that displays the data you want for that row.
This allows you flexibility in layout but it will add complexity to the report(s) and will mean that you make a lot of DataSet calls that could be avoided with other options.
I know this is old, but I wanted to elaborate on Ian's section 1 above using joins at the dataset level. (His answer was super helpful to me for a report I'm working on.)
per op:
Is the solution to have a separate dataset that brings back the entire list of categories and then somehow fit them together?
That is how I've handled it successfully, but you can do so without actually creating a separate dataset by using common table expressions (or temp tables, of course).
For these example tables:
AGE_Table
ID Group Group_Desc Toys
1 A 00-10 Teddy Bear
2 B 11-20 Video Game
3 C 21-30 Sports Car
4 D 31-40 Mansion
5 E 41-50 Jewelry
People_Table (filtered for whatever date)
ID Name Age Gender Age_Group
1 Ariel 07 F A
2 Brandon 23 M C
3 Chelsea 27 F C
4 Derek 06 M A
You want to see 2 results for the 00-10 row, 2 for the 21-30 row, and then still see rows for the other age groups even if there aren't any results.
We want to create a dataset with all the different age groupings and then join on it. Behold a solution using common table expressions:
with CTE_Age AS
(SELECT Distinct Age_Group from AGE_Table)
SELECT ID, Name, Age, Gender, CTE_Age.Age_Group FROM People_Table
RIGHT JOIN CTE_Age ON
People_Table.Age_Group = CTE_Age.Age_Group
This will return:
ID Name Age Gender Age_Group
1 Ariel 7 F A
4 Derek 6 M A
NULL NULL NULL NULL B
2 Brandon 23 M C
3 Chelsea 27 F C
NULL NULL NULL NULL D
NULL NULL NULL NULL E
Once you have that in your dataset, you can change NULL values to 0 on the report builder side -- I think in 2008R2 the default is just blank.
I need to create a join query for hierarchical data across 2 tables. These tables can have unlimited amounts of data and their structures are as follows:
group_id group_name group_order
1 group 1 2
2 group 2 1
field_id field_name parent_group field_order
1 field 1 1 1
2 field 2 2 2
3 field 3 2 1
I am currently able to get the correct format of data using 2 select queries with the second query inside a loop created from the results of the first query on the groups table.
The structure of the data I require from the result is as follows:
-group 2
- field 3
- field 2
- group 1
- field 1
Is it possible to get these results from one mysql query? I have read through the mysql document on hierarchical data by I am confused about how to incorporate the join.
Thanks for looking
You shouldn't need to think about it in terms of hierarchical data, you should just be able to select your fields and join on your group information. Try something like:
SELECT *
FROM Fields AS F
INNER JOIN Groups AS G
ON G.group_id = F.parent_group
ORDER BY group_order, field_order
Then you will get each fields as a row with the applicable group, also in the correct group order. Your loop should be able to handle the display you need.
one method
something that may convince you change your db schema