I'm trying to generate a result set from a table with effectively a unique/primary key as billyear, billmonth and type along with cost and consumption. So there could be 3 bill year and bill month identical entries but the type could be one of three values: E, W or NG.
I need to create a result set that has just one row per billyear and billmonth entry.
(
select month as billmonth, year as billyear, cost_estimate as eleccost, consumption_estimate as eleccons from tblbillforecast where buildingid=19 and type='E'
)
UNION (
select month as billmonth, year as billyear, cost_estimate as gascost, consumption_estimate as gascons from tblbillforecast where buildingid=19 and type='NG'
)
UNION (
select month as billmonth, year as billyear, cost_estimate as watercost, consumption_estimate as watercons from tblbillforecast where buildingid=19 and type='W'
)
This generates a result set with only billmonth, billyear, eleccost and eleccons columns. I've tried all kinds of solutions but the above example is the simplest to show where it's going wrong.
Additionally it still has 3 rows per billmonth/billyear unique combination instead of merging to one.
UPDATE:
Sample data
SELECT month AS billmonth,
year AS billyear,
SUM(CASE type WHEN 'E' THEN cost_estimate END) AS eleccost,
SUM(CASE type WHEN 'NG' THEN cost_estimate END) AS gascost,
SUM(CASE type WHEN 'W' THEN cost_estimate END) AS watercost
FROM tblbillforecast
WHERE buildingid=19
GROUP BY billmonth, billyear;
Result:
Expected result, eg:
year | month | eleccost | gascost | watercost
2018 | 1 | 32800 | 4460 | 4750
This is behaving correctly. An SQL query result set has one name per column, and this name applies to all the rows. So if you try to rename the column in the second or subsequent queries of the UNION, those new names are ignored. The name of the column is determined only by the first query of the UNION.
Additionally it still has 3 rows per billmonth/billyear unique combination instead of merging to one.
That's also correct behavior, according to the query you tried. UNION does not merge multiple rows into one, it only appends sets of rows.
As Akina hinted in the comments above, you may use multiple columns:
SELECT month AS billmonth,
year AS billyear,
SUM(CASE type WHEN 'E' THEN cost_estimate END) AS eleccost,
SUM(CASE type WHEN 'NG' THEN cost_estimate END) AS gascost,
SUM(CASE type WHEN 'W' THEN cost_estimate END) AS watercost
FROM tblbillforecast
WHERE buildingid=19
GROUP BY billmonth, billyear;
This uses GROUP BY to "merge" rows together, so you get one row in the result per month/year.
A quick bit of guidance on various data shaping operations in SQL:
JOIN - makes resultsets wider (more columns) by bringing together tables/resultsets in a side-by-side fashion generating output rows that have all the columns of the two input column sets
SELECT - typically makes resultsets narrower by allowing you to specify which columns you're interested in and which you are not; by not mentioning an available column it disappears meaning you output fewer columns
UNION - makes resultsets taller (more rows) by bringing together resultsets and outputting one on top of the other. Because columns always have a fixed data type and one name, you must have the same number of and type of, and order of columns
WHERE - makes resultsets shorter (fewer rows) by allowing you to specify truth based filters that exclude rows
It's not hard and fast; you can use select to create more columns too, but just in a very rudimentary sense these concepts hold true - JOIN to widen, UNION for taller, SELECT for narrower and WHERE for shorter. All the work you do with SQL is a data shaping exercise; you're either paring a rectangular block of data down or extending it, and in either a vertical or horizontal direction (or a mix).
I'm not going to get into grouping because that mixes rows up, and isn't something you tried in the question.. The reason for me writing this out was purely because you'd attempted to use a UNION (height-increasing) operation when you actually wanted a widen which, regardless of how it is done (JOIN or as per Bill's answer a SELECT+GROUP, which is valid, but relies on the "mixes rows up" aspect of grouping), specifically isn't done with a UNION. Union only makes stuff taller.
To give an example of how it might be done in an alternative way to Bill's approach, this task of yours has one huge table that is "too tall" - it uses 3 rows where 1 would do, if only it were a bit wider. That is to say if only there were 3 columns for electric/gas/water then we wouldn't need 3 rows with 1 utility in each.
Of course, we have this "one utility per row" because it is very flexible. Database tables don't have varying numbers of columns but they DO have varying numbers of rows. If a new bill type came along tomorrow - internet - no table changes are needed to accommodate it; add a new type I, and away you go, adding another row. We now store 4 rows of 1 utility where 1 row with 4 columns would do, but crucially we didn't have to change the table structure. We could have infinite different kinds of bills, and not need infinite columns because we can already have infinite rows
So you want to reshape your data from 4-rows-by-1-column to 1-row-by-4-columns. It could be solved as :
narrow the table to just year,month,building,type,cost AND shorten it to just electricity
separately narrow the table to just year,month,building,type,cost AND shorten it to just gas
separately narrow the table to just year,month,building,type,cost AND shorten it to just water
join (widening) all these newly created result sets , then narrow to remove the repeated year,month,building,type columns
That would look like:
SELECT e.year, e.month, e.building, e.cost, g.cost, w.cost
FROM
(SELECT year,month,building,cost FROM t WHERE type = 'E') e
JOIN
(SELECT year,month,building,cost FROM t WHERE type = 'NG') g
ON
e.year = g.year AND e.month = g.month AND e.building = g.building
JOIN
(SELECT year,month,building,cost FROM t WHERE type = 'W') w
ON
e.year = w.year AND e.month = w.month AND e.building = w.building
WHERE
e.building = 19
You can see clearly the 3 narrowing-and-shortening operations that pick out "just the gas", "just the electric", and "just the water" - they're the (SELECT year,month,building,cost FROM t WHERE type = 'NG') and that's what reduces the height of the original table, making it three times shorter than it was in each case. If we had 999 rows X 5 cols in the big table it goes to 3 sets of 333 x 5 rows each
You can see that we then JOIN these together to widen the results - our e.g 3 sets of 333 x 5 rows each widens to 333 x 15 when JOINed..
Then went from 333x15 down to 333 X 7 when SELECTed to ditch the repeated columns
It's likely not perfect (I'd perhaps left join all 3 onto a 4th set of numbers that are just the common columns in case some utilities aren't present for a particular month), and perhaps some people will come along complaining that it's less performant because it hits the table 3 times.. All that is accessory to the point I'm making about SQL being an exercise in reshaping data - tables are the starting blocks of data and you cut them up narrower and shorter, then stick them together side by side, or on top of each other and that becomes your new data block that's maybe wider, higher, both.. In any case it's definitely a different shape to what you started with. And then you can cut and shape again, and again..
Go with Bill's conditional agg (though this way would be fine if there is one row per building/year/month) but take away a stronger notion about in what direction these common operations (SELECT/JOIN/WHERE/UNION) reshape your data
Footnote about Bill's conditional aggregation (I know I said I wouldn't talk about it but it might make more sense to now). If you have:
Type, Cost
E, 123
NG, 456
W, 789
And you do a
SELECT
CASE WHEN Type = 'E' THEN Cost END as CostE,
CASE WHEN Type = 'NG' THEN Cost END as CostG,
CASE WHEN Type = 'W' THEN Cost END as CostW
...
It spreads the data out over more columns - the data has "gone from vertical to diagonal"
CostE, CostNG, CostW
123, NULL, NULL
NULL, 456, NULL
NULL, NULL, 789
But it's still too tall. If you then run a GROUP BY, which mixes rows up and ask for e.g. just the MAX from each column, then all the NULLs will disappear (because there is a non null somewhere in the column, and NULL is lost if there is a non null, no matter what you're doing) and the rows collapse, mixing together, into one:
CostE, CostNG, CostW
123, 456, 789
The data has pivoted round from being vertical, to being horizontal - another data shaping. It was pulled wider, and squashed flatter
I have a csv file
It contains a set of values, and I try to rank you for these values , as in Excel(n competition ranking, items that compare equal receive the same ranking number, and then a gap is left in the ranking numbers. The number of ranking numbers that are left out in this gap is one less than the number of items that compared equal. Equivalently, each item's ranking number is 1 plus the number of items ranked above it. This ranking strategy is frequently adopted for competitions, as it means that if two (or more) competitors tie for a position in the ranking, the position of all those ranked below them is unaffected (i.e., a competitor only comes second if exactly one person scores better than them, third if exactly two people score better than them, fourth if exactly three people score better than them, etc.).
Thus if A ranks ahead of B and C (which compare equal) which are both ranked ahead of D, then A gets ranking number 1 ("first"), B gets ranking number 2 ("joint second"), C also gets ranking number 2 ("joint second") and D gets ranking number 4 ("fourth").
Set ranking[0] to 1.
For each index i in the score-list
If score[i] equals score[i-1] they should have same ranknig:
ranking[i] = ranknig[i-1]
else the ranking should equal the current index:
ranking[i] = i + 1
(+ 1 due to 0-based indecies and 1-based ranking)
and I tried some code to do
'` extensions [csv]
globals [data variable]
turtles-own [var ]
to setup
file-close-all
file-open "test1.csv"
;; read the data all at once by using csv:from-file
set data csv:from-file "test1.csv"
reset-ticks
end
to ranking
if file-at-end? [stop]
;; extract value from the list, using item 0 to remove the list, and just keep the value
set variable item 0 item ticks data
if ticks = length data [stop]
let rank-list sort-on [variable] turtles
let ranks n-values length rank-list [ ]
(foreach rank-list ranks [ask ?1 [set variable 2] ] )
tick
;show variable
end`
but I did not succeed with Netlogo (6.1)
I want to display all duplicate records from my table, rows are like this
uid planet degree
1 1 104
1 2 109
1 3 206
2 1 40
2 2 76
2 3 302
I have many different OR statements with different combinations in subquery and I want to count every one of them which matches, but it only displays the first match of each planet and degree.
Query:
SELECT DISTINCT
p.uid,
(SELECT COUNT(*)
FROM Params AS p2
WHERE p2.uid = p.uid
AND(
(p2.planet = 1 AND p2.degree BETWEEN 320 - 10 AND 320 + 10) OR
(p2.planet = 7 AND p2.degree BETWEEN 316 - 10 AND 316 + 10)
...Some more OR statements...
)
) AS counts FROM Params AS p HAVING counts > 0 ORDER BY p.uid DESC
any solution folks?
updated
So, the problem most people have with their counting-joined-sub-query-group-queries, is that the base query isn't right, and the following may seem like a complete overkill for this question ;o)
base data
in this particular example what you would want as a data basis is at first this:
(uidA, planetA, uidB, planetB) for every combination of player A and player B planets. that one is quite simple (l is for left, r is for right):
SELECT l.uid, l.planet, r.uid, r.planet
FROM params l, params r
first step done.
filter data
now you want to determine if - for one row, meaning one pair of planets - the planets collide (or almost collide). this is where the WHERE comes in.
WHERE ABS(l.degree-r.degree) < 10
would for example only leave those pairs of planet with a difference in degrees of less than 10. more complex stuff is possible (your crazy conditional ...), for example if the planets have different diameter, you may add additional stuff. however, my advise would be, that you put some additional data that you have in your query into tables.
for example, if all 1st planets players have the same size, you could have a table with (planet_id, size). If every planet can have different sizes, add the size to the params table as a column.
then your WHERE clause could be like:
WHERE l.size+r.size < ABS(l.degree-r.degree)
if for example two big planets with size 5 and 10 should at least be 15 degrees apart, this query would find all those planets that aren't.
we assume, that you have a nice conditional, so at this point, we have a list of (uidA, planetA, uidB, planetB) of planets, that are close to colliding or colliding (whatever semantics you chose). the next step is to get the data you're actually interested in:
limit uidA to a specific user_id (the currently logged in user for example)
add l.uid = <uid> to your WHERE.
count for every planet A, how many planets B exist, that threaten collision
add GROUP BY l.uid, l.planet,
replace r.uid, r.planet with count(*) as counts in your SELECT clause
then you can even filter: HAVING counts > 1 (HAVING is the WHERE for after you have GROUPed)
and of course, you can
filter out certain players B that may not have planetary interactions with player A
add to your WHERE
r.uid NOT IN (1)
find only self collisions
WHERE l.uid = r.uid
find only non-self collisions
WHERE l.uid <> r.uid
find only collisions with one specific planet
WHERE l.planet = 1
conclusion
a structured approach where you start from the correct base data, then filter it appropriately and then group it, is usually the best approach. if some of the concepts are unclear to you, please read up on them online, there are manuals everywhere
final query could look something like this
SELECT l.uid, l.planet, count(*) as counts
FROM params l, params r
WHERE [ collision-condition ]
GROUP BY l.uid, l.planet
HAVING counts > 0
if you want to collide a non-planet object, you might want to either make a "virtual table", so instead of FROM params l, params r you do (with possibly different fields, I just assume you add a size-field that is somehow used):
FROM params l, (SELECT 240 as degree, 2 as planet, 5 as size) r
multiple:
FROM params l, (SELECT 240 as degree, 2 as planet, 5 as size
UNION
SELECT 250 as degree, 3 as planet, 10 as size
UNION ...) r
In Business Objects XI Web Intelligence the Rank function returns dense results. For example when ranking by "Amount" I want to return the top ten records only. However three records tie for 5th place on "Amount". Result is a total of 12 records: one each for places 1 to 4 and 6 to 10 and 3 records for 5th place.
Desired result is a "sparse" top ten that drops the two lowest ranked records (places 9 and 10).
I tried to do this and rank customers by amount.
I have 2 objects: [Amount] and [Customernumber].
[Customernumber] is numeric.
I created a new variable:
[varForSorting]=[Amount]*10000000+ToNumber([Customernumber])
Then I rank by the new variable [varForSorting].
Customers with the same Amount will be sorted in Alphabetic order by Customer number. I hope this helps.
Here is an example of how I solved it for a change in Account Count over time. This approach allows you to break your dense rank ties using other measures in your data provider. Basically you use multiple measures in one rank and decide which measure to rank by first, second, etc:
Step 1: Determine the change amount
v_Account_Count_Delta_Amount
=([v_Account_Count_After] - [v_Account_Count_Before])
Step 2: Rank the change amounts (this is where ties and dense rank cause multiple rows to be returned)
v_Account_Count_Delta_Amount_Rank
=NoFilter(Rank([v_Account_Count_Delta_Amount]))
Step 3: Compute the tie breaking rank using other measures
v_MonthToDateMeasuresRank
=NoFilter(Rank([Month To Date Sva]+ [Bank Share Balance] + [Total Commitment]))
Step 4: Compute a combined rank that is now free from ties and weight your ranks however you choose
v_Account_Count_Combined_Rank
=Rank([v_Account_Count_Delta_Amount_Rank]* 1000000 + [v_MonthToDateMeasuresRank];Bottom)
Step 5: Filter your data block for v_Account_Count_Combined_Rank <= 10
Ultimately depending on your data it could still result in a tie unless you take the additional step of ranking by some other unique attribute that you can turn to a number (see Maria Ruchko's answer for that bit of magic using Customer Number). I tried to do that with RowIndex() and LineNumber() but could not get usable results. My measures when added together happen to never tie so this works for my specific data blob.
I am trying to calculate the cost of the (most efficient) block nested loop join in terms of NDPR (number of disk page reads). Suppose you have a query of the form:
SELECT COUNT(*)
FROM county JOIN mcd
ON count.state_code = mcd.state_code
AND county.fips_code = mcd.fips_code
WHERE county.state_code = #NO
where #NO is substituted for a state code on each execution of the query.
I know that I can derive the NPDR using: NPDR(R x S) = |Pages(R)| + Pages(R) / B - 2 . |P ages(S)|
(where the smaller table is used as the outer in order to produce less page reads. Ergo:
R = county, S = mcd).
I also know that Page size = 2048 bytes
Pointer = 8 byte
Num. rows in mcd table = 35298
Num. rows in county table = 3141
Free memory buffer pages B = 100
Pages(X) = (rowsize)(numrows) / pagesize
What I am trying to figure out is how the "WHERE county.state_code = #NO" affects my cost?
Thanks for your time.
First a couple of observations regarding the formula you wrote:
I'm not sure why it you write "B - 2" instead of "B - 1". From a theoretical perspective, you need a single buffer page to read in relation S (you can do it by reading one page at a time).
Make sure you use all the brackets. I would write the formula as:
NPDR(R x S) = |Pages(R)| + |Pages(R)| / (B-2) * |Pages(S)|
The all numbers in the formula would need to be rounded up (but this is nitpicking).
The explanation for the generic BNLJ formula:
You read in as many tuples from the smaller relation (R) as you can keep in memory (B-1 or B-2 pages worth of tuples).
For each group of B-2 pages worth of tuples, you then have to read the whole S relation ( |Pages(S)|) to perform the join for that particular range of relation R.
At the end of the join, relation R is read exactly one time and relation S is read as many times as we filled the memory buffer, namely |Pages(R)| / (B-2) times.
Now the answer:
In your example a selection criteria is applied to relation R (table Country in this case). This is the WHERE county.state_code = #NO part of the query. Therefore, the generic formula does not apply directly.
When reading from relation R (i.e., table Country in your example), we can discard all the non-qualifying tuples that do not match the selection criteria. Assuming that there are 50 states in the USA and that all states have the same number of counties, only 2% of the tuples in table Country qualify on average and need to be stored in memory. This reduces the number of iteration of the inner loop of the join (i.e., the number of times we need to scan relation S / table mcs). The 2% number is obviously just the expected average and will change depending on the actual given state.
The formula for your problem therefore becomes:
NPDR(R x S) = |Pages(County)| + |Pages(County)| / (B - 2) * |Counties in state #NO| / |Rows in table County| * |Pages(Mcd)|