SQL query for operation between rows under some condition - mysql

Its little complicated query as it contains some conditions.
I have tables like this:
table DC - which contains one row for one northing-easting pair
Columns - Id Northing Easting
PossibleValues - Guid Std value Std value
table DCR - which contains multiple rows for each row in DC. Each row here corresponds to data on each pass on that exact location.
Columns - Id VibStatus DrivingDir CompactionValue UtcDate
PossibleValues - Guid 0/1 Forward/Reverse/Neutral +ve integers Timestamp
table DCMappings - which contains mapping between both tables.
DCId DCRId
The output I need should contain fields like this:
ResultTable
DCId DCRId Northing Easting VibStatus DrivingDir CompValue Position CompProgress
Here, Position is its position in chronological order when sorted by UtcDate grouped by DC.Id(See query at end to understand more).
And CompProgress has some conditions which is making it complicated.
CompProgress is percentage increase/decrease in CompValue compared to its previous row which was in same driving direction when arranged in ASC order of UtcDate(chronological) where the rows to be considered here should only be the ones with VibStatus set to ON(1) grouped by DCId's.
Each row in DC has multiple rows in DCR. So if row 1 in DC has 10 rows in DCR, the CompProgress should consider these 10 rows alone for calculation and then for row 2 in DC, etc...
I have written following query to extract needed fields except calculation of CompProgress. Please help me in this.
SELECT DC.Id, DCR.Id, Northing, Easting, VibStatus, DrivingDir, CompValue, ROW_NUMBER() OVER (PARTITION By dcm."DCId" ORDER BY dcr."UtcDate") as passNo
FROM "DCR" dcr LEFT JOIN "DCMappings" dcm ON "Id" = dcm."DCRId"
LEFT JOIN "DC" dc ON dc."Id" = dcm."DCId"
Need evaluation of CompProgress in this query.
Sorry for lot of text. But it was necessary to make others understand what is needed.

Related

MYSQL Alternative to UNION for same table reusing same columns selected as new name

I'm trying to generate a result set from a table with effectively a unique/primary key as billyear, billmonth and type along with cost and consumption. So there could be 3 bill year and bill month identical entries but the type could be one of three values: E, W or NG.
I need to create a result set that has just one row per billyear and billmonth entry.
(
select month as billmonth, year as billyear, cost_estimate as eleccost, consumption_estimate as eleccons from tblbillforecast where buildingid=19 and type='E'
)
UNION (
select month as billmonth, year as billyear, cost_estimate as gascost, consumption_estimate as gascons from tblbillforecast where buildingid=19 and type='NG'
)
UNION (
select month as billmonth, year as billyear, cost_estimate as watercost, consumption_estimate as watercons from tblbillforecast where buildingid=19 and type='W'
)
This generates a result set with only billmonth, billyear, eleccost and eleccons columns. I've tried all kinds of solutions but the above example is the simplest to show where it's going wrong.
Additionally it still has 3 rows per billmonth/billyear unique combination instead of merging to one.
UPDATE:
Sample data
SELECT month AS billmonth,
year AS billyear,
SUM(CASE type WHEN 'E' THEN cost_estimate END) AS eleccost,
SUM(CASE type WHEN 'NG' THEN cost_estimate END) AS gascost,
SUM(CASE type WHEN 'W' THEN cost_estimate END) AS watercost
FROM tblbillforecast
WHERE buildingid=19
GROUP BY billmonth, billyear;
Result:
Expected result, eg:
year | month | eleccost | gascost | watercost
2018 | 1 | 32800 | 4460 | 4750
This is behaving correctly. An SQL query result set has one name per column, and this name applies to all the rows. So if you try to rename the column in the second or subsequent queries of the UNION, those new names are ignored. The name of the column is determined only by the first query of the UNION.
Additionally it still has 3 rows per billmonth/billyear unique combination instead of merging to one.
That's also correct behavior, according to the query you tried. UNION does not merge multiple rows into one, it only appends sets of rows.
As Akina hinted in the comments above, you may use multiple columns:
SELECT month AS billmonth,
year AS billyear,
SUM(CASE type WHEN 'E' THEN cost_estimate END) AS eleccost,
SUM(CASE type WHEN 'NG' THEN cost_estimate END) AS gascost,
SUM(CASE type WHEN 'W' THEN cost_estimate END) AS watercost
FROM tblbillforecast
WHERE buildingid=19
GROUP BY billmonth, billyear;
This uses GROUP BY to "merge" rows together, so you get one row in the result per month/year.
A quick bit of guidance on various data shaping operations in SQL:
JOIN - makes resultsets wider (more columns) by bringing together tables/resultsets in a side-by-side fashion generating output rows that have all the columns of the two input column sets
SELECT - typically makes resultsets narrower by allowing you to specify which columns you're interested in and which you are not; by not mentioning an available column it disappears meaning you output fewer columns
UNION - makes resultsets taller (more rows) by bringing together resultsets and outputting one on top of the other. Because columns always have a fixed data type and one name, you must have the same number of and type of, and order of columns
WHERE - makes resultsets shorter (fewer rows) by allowing you to specify truth based filters that exclude rows
It's not hard and fast; you can use select to create more columns too, but just in a very rudimentary sense these concepts hold true - JOIN to widen, UNION for taller, SELECT for narrower and WHERE for shorter. All the work you do with SQL is a data shaping exercise; you're either paring a rectangular block of data down or extending it, and in either a vertical or horizontal direction (or a mix).
I'm not going to get into grouping because that mixes rows up, and isn't something you tried in the question.. The reason for me writing this out was purely because you'd attempted to use a UNION (height-increasing) operation when you actually wanted a widen which, regardless of how it is done (JOIN or as per Bill's answer a SELECT+GROUP, which is valid, but relies on the "mixes rows up" aspect of grouping), specifically isn't done with a UNION. Union only makes stuff taller.
To give an example of how it might be done in an alternative way to Bill's approach, this task of yours has one huge table that is "too tall" - it uses 3 rows where 1 would do, if only it were a bit wider. That is to say if only there were 3 columns for electric/gas/water then we wouldn't need 3 rows with 1 utility in each.
Of course, we have this "one utility per row" because it is very flexible. Database tables don't have varying numbers of columns but they DO have varying numbers of rows. If a new bill type came along tomorrow - internet - no table changes are needed to accommodate it; add a new type I, and away you go, adding another row. We now store 4 rows of 1 utility where 1 row with 4 columns would do, but crucially we didn't have to change the table structure. We could have infinite different kinds of bills, and not need infinite columns because we can already have infinite rows
So you want to reshape your data from 4-rows-by-1-column to 1-row-by-4-columns. It could be solved as :
narrow the table to just year,month,building,type,cost AND shorten it to just electricity
separately narrow the table to just year,month,building,type,cost AND shorten it to just gas
separately narrow the table to just year,month,building,type,cost AND shorten it to just water
join (widening) all these newly created result sets , then narrow to remove the repeated year,month,building,type columns
That would look like:
SELECT e.year, e.month, e.building, e.cost, g.cost, w.cost
FROM
(SELECT year,month,building,cost FROM t WHERE type = 'E') e
JOIN
(SELECT year,month,building,cost FROM t WHERE type = 'NG') g
ON
e.year = g.year AND e.month = g.month AND e.building = g.building
JOIN
(SELECT year,month,building,cost FROM t WHERE type = 'W') w
ON
e.year = w.year AND e.month = w.month AND e.building = w.building
WHERE
e.building = 19
You can see clearly the 3 narrowing-and-shortening operations that pick out "just the gas", "just the electric", and "just the water" - they're the (SELECT year,month,building,cost FROM t WHERE type = 'NG') and that's what reduces the height of the original table, making it three times shorter than it was in each case. If we had 999 rows X 5 cols in the big table it goes to 3 sets of 333 x 5 rows each
You can see that we then JOIN these together to widen the results - our e.g 3 sets of 333 x 5 rows each widens to 333 x 15 when JOINed..
Then went from 333x15 down to 333 X 7 when SELECTed to ditch the repeated columns
It's likely not perfect (I'd perhaps left join all 3 onto a 4th set of numbers that are just the common columns in case some utilities aren't present for a particular month), and perhaps some people will come along complaining that it's less performant because it hits the table 3 times.. All that is accessory to the point I'm making about SQL being an exercise in reshaping data - tables are the starting blocks of data and you cut them up narrower and shorter, then stick them together side by side, or on top of each other and that becomes your new data block that's maybe wider, higher, both.. In any case it's definitely a different shape to what you started with. And then you can cut and shape again, and again..
Go with Bill's conditional agg (though this way would be fine if there is one row per building/year/month) but take away a stronger notion about in what direction these common operations (SELECT/JOIN/WHERE/UNION) reshape your data
Footnote about Bill's conditional aggregation (I know I said I wouldn't talk about it but it might make more sense to now). If you have:
Type, Cost
E, 123
NG, 456
W, 789
And you do a
SELECT
CASE WHEN Type = 'E' THEN Cost END as CostE,
CASE WHEN Type = 'NG' THEN Cost END as CostG,
CASE WHEN Type = 'W' THEN Cost END as CostW
...
It spreads the data out over more columns - the data has "gone from vertical to diagonal"
CostE, CostNG, CostW
123, NULL, NULL
NULL, 456, NULL
NULL, NULL, 789
But it's still too tall. If you then run a GROUP BY, which mixes rows up and ask for e.g. just the MAX from each column, then all the NULLs will disappear (because there is a non null somewhere in the column, and NULL is lost if there is a non null, no matter what you're doing) and the rows collapse, mixing together, into one:
CostE, CostNG, CostW
123, 456, 789
The data has pivoted round from being vertical, to being horizontal - another data shaping. It was pulled wider, and squashed flatter

Selecting multiple same value rows out of the same column

I have a table, where one of the columns is named mid. It has a lot of values, some of them repeat themselves. Theres also a column named chashrate. It has a different value for each mid row. Theres also a column named pid, which shows the id of each row.
I've tried pulling out specific value rows with HAVING, but I can only do one value at a time or multiple values that dont match each other
$miner = $pdo->query("SELECT * FROM data WHERE pid='6'")->fetchall();
What I need to do is collect all the same MID column value rows, with the id pid=6 so for example all of the mid = 8; pid=6, collect their chashrate and sum it up. So for example I would get mid(8)=17394, mid(6)=28424 etc.
Here's a photo of the table: https://i.imgur.com/9xX6sYm.png
The same colored rows need to be selected and their chashrate values summed up.
Try using SUM to sum the cashrate values and GROUP BY to group them by mid.
SELECT mid
, SUM(`cashrate`) AS total
FROM `data`
WHERE pid = 6
GROUP BY mid;
Check it here.
For the given data on the image, this query will output the following result:
mid | total
6 | 981
8 | 374
You seem to want aggregation:
select mid, sum(chashrate) as sum_chashrate
from data
where pid = 6
group by pid, mid;
This will return multiple rows, one for each mid value.
You can do this for multiple pids -- or even all of them, by removing or changing the where clause.

Count number of row results for a row group

How do I group/filter rows and then get total rows for each column. I am going to diagram what the result should be. I don't want to show the actual data. Just the count per column
Out Put should look like this
Column A Column B Column C
Row A - 235 records 300 records 15 records
Row B - 1 record 80 records 900 records
Each column represent a count on the same field but filtered.
So ..
Column A is really Count(MyColumn) WHERE = A
Column B is really Count(MyColumn) WHERE = B
To summarize each row is a grouping + filter and each column is a count based on the number of rows contained in that grouping. No row data needs to be displayed.
You can do this in a table in the group by using the following formula:
=SUM(IIF(Fields!MyColumn.Value = "A", 1, 0))
However, this type of summary report is what a matrix is designed to do. Use the Row field as the row group, the column field as the column group and a Count expression in the intersection and it will do it all for you.

Can SQL query do this?

I have a table "audit" with a "description" column, a "record_id" column and a "record_date" column. I want to select only those records where the description matches one of two possible strings (say, LIKE "NEW%" OR LIKE "ARCH%") where the record_id in each of those two matches each other. I then need to calculate the difference in days between the record_date of each other.
For instance, my table may contain:
id description record_id record_date
1 New Sub 1000 04/14/13
2 Mod 1000 04/14/13
3 Archived 1000 04/15/13
4 New Sub 1001 04/13/13
I would want to select only rows 1 and 3 and then calculate the number of days between 4/15 and 4/14 to determine how long it took to go from New to Archived for that record (1000). Both a New and an Archived entry must be present for any record for it to be counted (I don't care about ones that haven't been archived). Does this make sense and is it possible to calculate this in a SQL query? I don't know much beyond basic SQL.
I am using MySQL Workbench to do this.
The following is untested, but it should work asuming that any given record_id can only show up once with "New Sub" and "Archived"
select n.id as new_id
,a.id as archive_id
,record_id
,n.record_date as new_date
,a.record_date as archive_date
,DateDiff(a.record_date, n.record_date) as days_between
from audit n
join audit a using(record_id)
where n.description = 'New Sub'
and a.description = 'Archieved';
I changed from OR to AND, because I thought you wanted only the nr of days between records that was actually archived.
My test was in SQL Server so the syntax might need to be tweaked slightly for your (especially the DATEDIFF function) but you can select from the same table twice, one side grabbing the 'new' and one grabbing the 'archived' then linking them by record_id...
SELECT
newsub.id,
newsub.description,
newsub.record_date,
arc.id,
arc.description,
arc.record_date,
DATEDIFF(day, newsub.record_date, arc.record_date) AS DaysBetween
FROM
foo1 arc
, foo1 newsub
WHERE
(newsub.description LIKE 'NEW%')
AND
(arc.description LIKE 'ARC%')
AND
(newsub.record_id = arc.record_id)

How can I return a row for each date, even when there is no data for that date (in which case the row should be filled with zero's)?

I hope I will be able to make my problem clear.
Ik have a table called tweets from which I want to extract information for each data in the daterange table. This table holds 142 dates, of which 102 dates have the property trading (day on which market was open) set to 1 (trading=1).
The below query extracts information from the tweets table for 20 companies (identified by sp100_id). The expected resultset therefore contains 20 x 102 = 2,040 rows. However, I only get returned 1,987 rows because for some date-company combinations, the tweets table holds no data. I need these "empty days" to be included in the resultset however. I thought I could accomplish this by using COALESCE(X, 0), returning 0 if there would be no data, but the result is the same: 1,987 rows.
Based on this information and the query below, does anybody know how I can get it to return 102 rows (1 row for each daterange._date with trading=1) for each sp100_id in the tweets table?
SELECT
sp100.sp100_id,
daterange._date,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`retweet_count`, 0)),0) AS `pos-retweet`,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`user-quality`, 0)),0) AS `pos-quality`,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`follow`, 0)),0) AS `pos-follow`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`retweet_count`, 0)),0) AS `neg-retweet`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`user-quality`, 0)),0) AS `neg-quality`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`follow`, 0)),0) AS `neg-follow`
FROM
sp100
CROSS JOIN
daterange
LEFT JOIN
tweets
ON tweets.nyse_date = daterange._date
AND tweets.sp100_id = sp100.sp100_id
WHERE sp100.sp100_id BETWEEN 1 AND 20 AND tweets.type != 1 AND daterange.trading = 1
GROUP BY
sp100.sp100_id, daterange._date
In any other case, I would provide you with a SQLFiddle, but it would be a lot of work to export a proper portion of the tables used to SQLFiddle while the solution might be clear to some real SQL guru anyway :-)
The problem comes from requiring that tweets.type != 1 in your WHERE clause.
For the dates that have no associated tweets, the outer join will result in all tweets columns, including tweets.type, being NULL. As documented under Working with NULL Values:
Because the result of any arithmetic comparison with NULL is also NULL, you cannot obtain any meaningful results from such comparisons.
In MySQL, 0 or NULL means false and anything else means true. The default truth value from a boolean operation is 1.
Therefore such records are filtered by your WHERE clause.
As #Martin Smith commented, you can move this filter criterion into the ON clause of your outer join (so that the test is performed only against actual tweets records rather than simulated NULL ones).
Alternatively, you could rewrite the filter to handle NULL. For example, using the NULL-safe equality operator:
NOT tweets.type <=> 1
As an aside, I usually don't bother with a daterange table and instead omit dates for which there is no data from the resultset: instead, I handle missing dates within my application code.
You need a calendar table filled with each day. I know it might sound silly, but this solution solves yo a lot of problems. The same solution you can have also with integers ( integer tables)