I have a table in which a certain column's data is sometimes missing, in which cases has just a dash.
But since there's another column which is unique, the missing data can be deduced from other rows.
In other words, I have something like this:
Unique model
Maker
Profit
abcd1234
-
56
zgh675
Company Y
40
abcd1234
Company X
3
zgh675
-
10
abcd1234
Company X
1
Which query can I use to automatically reach the following (the list of Makers is dynamic but every model can only go to one of them):
Unique model
Maker
Profit
abcd1234
Company X
60
zgh675
Company Y
50
?
You may aggregate by unique model and then take the MAX value of the maker, along with the sum of the profit:
SELECT model, MAX(maker) AS maker, SUM(profit) AS profit
FROM yourTable
GROUP BY model;
This approach should work assuming that each model only has one unique non NULL value.
I have numbers from 1 to 36. What I am trying to do is put all these numbers into three groups and works out all various permutations of groups.
Each group must contain 12 numbers, from 1 to 36
A number cannot appear in more than one group, per permutation
Here is an example....
Permutation 1
Group 1: 1,2,3,4,5,6,7,8,9,10,11,12
Group 2: 13,14,15,16,17,18,19,20,21,22,23,24
Group 3: 25,26,27,28,29,30,31,32,33,34,35,36
Permutation 2
Group 1: 1,2,3,4,5,6,7,8,9,10,11,13
Group 2: 12,14,15,16,17,18,19,20,21,22,23,24
Group 3: 25,26,27,28,29,30,31,32,33,34,35,36
Permutation 3
Group 1: 1,2,3,4,5,6,7,8,9,10,11,14
Group 2: 12,11,15,16,17,18,19,20,21,22,23,24
Group 3: 25,26,27,28,29,30,31,32,33,34,35,36
Those are three example, I would expect there to be millions/billions more
The analysis that follows assumes the order of groups matters - that is, if the numbers were 1, 2, 3 then the grouping [{1},{2},{3}] is distinct from the grouping [{3},{2},{1}] (indeed, there are six distinct groupings when taking from this set of numbers).
In your case, how do we proceed? Well, we must first choose the first group. There are 36 choose 12 ways to do this, or (36!)/[(12!)(24!)] = 1,251,677,700 ways. We must then choose the second group. There are 24 choose 12 ways to do this, or (24!)/[(12!)(12!)] = 2,704,156 ways. Since the second choice is already conditioned upon the first we may get the total number of ways of taking the three groups by multiplying the numbers; the total number of ways to choose three equal groups of 12 from a pool of 36 is 3,384,731,762,521,200. If you represented numbers using 8-bit bytes then to store every list would take at least ~3 pentabytes (well, I guess times the size of the list, which would be 36 bytes, so more like ~108 pentabytes). This is a lot of data and will take some time to generate and no small amount of disk space to store, so be aware of this.
To actually implement this is not so terrible. However, I think you are going to have undue difficulty implementing this in SQL, if it's possible at all. Pure SQL does not have operations that return more than n^2 entries (for a simple cross join) and so getting such huge numbers of results would require a large number of joins. Moreover, it does not strike me as possible to generalize the procedure since pure SQL has no ability to do general recursion and therefore cannot do a variable number of joins.
You could use a procedural language to generate the groupings and then write the groupings into a database. I don't know whether this is what you are after.
n = 36
group1[1...12] = []
group2[1...12] = []
group3[1...12] = []
function Choose(input[1...n], m, minIndex, group)
if minIndex + m > n + 1 then
return
if m = 0 then
if group = group1 then
Choose(input[1...n], 12, 1, group2)
else if group = group2 then
group3[1...12] = input[1...12]
print group1, group2, group3
for i = i to n do
group[12 - m + 1] = input[i]
Choose(input[1 ... i - 1].input[i + 1 ... n], m - 1, i, group)
When you call this like Choose([1...36], 12, 1, group1) what it does is fill in group1 with all possible ordered subsequences of length 12. At that point, m = 0 and group = group1, so the call Choose([?], 12, 1, group2) is made (for every possible choice of group1, hence the ?). That will choose all remaining ordered subsequences of length 12 for group2, at which point again m = 0 and now group = group2. We may now safely assign group3 to the remaining entries (there is only one way to choose group3 after choosing group1 and group2).
We take ordered subsequences only by propagating the index at which to begin looking on the recursive call (minIdx). We take ordered subsequences to avoid getting permutations of the same set of 12 items (since order doesn't matter within a group).
Each recursive call to Choose in the loop passes input with one element removed: precisely that element that just got added to the group under consideration.
We check for minIndex + m > n + 1 and stop the recursion early because, in this case, we have skipped too many items in the input to be able to ever fill up the current group with 12 items (while choosing the subsequence to be ordered).
You will notice I have hard-coded the assumption of 12/36/3 groups right into the logic of the program. This was done for brevity and clarity, not because you can't make parameterize it in the input size N and the number of groups k to form. To do this, you'd need to create an array of groups (k groups of size N/k each), then call Choose with N/k instead of 12 and use a select/switch case statement instead of if/then/else to determine whether to Choose again or print. But those details can be left as an exercise.
I have a chart with a count of X per year per category, i.e.
X ##### # 2012
X ###### # 2013
Y #############
Y ##
Z ###########
Z #################
How can I apply a sort to the 2013 values (but keep the grouping within X, Y, Z) - i.e. in this case, Z at the top, X mid and Y at the bottom, based on 2013 values.
Ok, the trick is to name the Category Group and then use a sort expression on the Category Group. If you want to sort by values in 2013, use
=Sum(IIf(year(Fields!TradeDate.Value) = 2013,
CDbl(Fields![The field being summed].Value), CDbl(0)),
"[The Category Grouping Name]")
The Cdbl is required to ensure Sum is working on the same data types.
I bet if you add row groups to your report By Year and By -Whatever XYZ Is- then you can sort your output by year while keeping the xyz grouping.
I am trying to calculate the cost of the (most efficient) block nested loop join in terms of NDPR (number of disk page reads). Suppose you have a query of the form:
SELECT COUNT(*)
FROM county JOIN mcd
ON count.state_code = mcd.state_code
AND county.fips_code = mcd.fips_code
WHERE county.state_code = #NO
where #NO is substituted for a state code on each execution of the query.
I know that I can derive the NPDR using: NPDR(R x S) = |Pages(R)| + Pages(R) / B - 2 . |P ages(S)|
(where the smaller table is used as the outer in order to produce less page reads. Ergo:
R = county, S = mcd).
I also know that Page size = 2048 bytes
Pointer = 8 byte
Num. rows in mcd table = 35298
Num. rows in county table = 3141
Free memory buffer pages B = 100
Pages(X) = (rowsize)(numrows) / pagesize
What I am trying to figure out is how the "WHERE county.state_code = #NO" affects my cost?
Thanks for your time.
First a couple of observations regarding the formula you wrote:
I'm not sure why it you write "B - 2" instead of "B - 1". From a theoretical perspective, you need a single buffer page to read in relation S (you can do it by reading one page at a time).
Make sure you use all the brackets. I would write the formula as:
NPDR(R x S) = |Pages(R)| + |Pages(R)| / (B-2) * |Pages(S)|
The all numbers in the formula would need to be rounded up (but this is nitpicking).
The explanation for the generic BNLJ formula:
You read in as many tuples from the smaller relation (R) as you can keep in memory (B-1 or B-2 pages worth of tuples).
For each group of B-2 pages worth of tuples, you then have to read the whole S relation ( |Pages(S)|) to perform the join for that particular range of relation R.
At the end of the join, relation R is read exactly one time and relation S is read as many times as we filled the memory buffer, namely |Pages(R)| / (B-2) times.
Now the answer:
In your example a selection criteria is applied to relation R (table Country in this case). This is the WHERE county.state_code = #NO part of the query. Therefore, the generic formula does not apply directly.
When reading from relation R (i.e., table Country in your example), we can discard all the non-qualifying tuples that do not match the selection criteria. Assuming that there are 50 states in the USA and that all states have the same number of counties, only 2% of the tuples in table Country qualify on average and need to be stored in memory. This reduces the number of iteration of the inner loop of the join (i.e., the number of times we need to scan relation S / table mcs). The 2% number is obviously just the expected average and will change depending on the actual given state.
The formula for your problem therefore becomes:
NPDR(R x S) = |Pages(County)| + |Pages(County)| / (B - 2) * |Counties in state #NO| / |Rows in table County| * |Pages(Mcd)|