I'm trying to write a SQL query for a data quality report that presents data quality failed values from multiple columns into one column. Please see the below example
FACT TABLE
Ac_Nm INAmt Ast Rcs
123 100 5000 NA
456 200 -200 Yes
789 -300 1000 No
DESIRED OUTPUT (POPULATE VAL COLUMN)
Ac_Nm Is_Clm Val
123 RCS NA
456 Ast -200
789 InAmt -300
How do I write a SQL query to populate the Val column? I've got the rest of the data quality report query written.
In the above example I have a fact table where data quality issues have been identified in various columns (negative values, 'NA' values where there should be a Yes/No response, etc). I'd like to know how to write a dynamic SQL query that returns that failed value from the Fact Table depending on the account number and the column name. In the first row the desired output lists the account number(123) with the issue column name (RCS) containing the value at issue, and the Val column listing the value causing the issue (NA). I just need to know how to write a SQL query to populate the Val column depending on the account num and issue column.
You could do it using case statements, assuming only one column is going to have a "bad" value, as follows:
SELECT Ac_Nm,
CASE WHEN INAmt < 0 THEN 'INAmt'
WHEN Ast < 0 THEN 'Ast'
WHEN Rcs = 'N/A' THEN 'RCS'
ELSE NULL END AS Is_Clm,
CASE WHEN INAmt < 0 THEN CONVERT(INAmt, char)
WHEN Ast < 0 THEN CONVERT(Ast, char)
WHEN Rcs = 'N/A' THEN Rcs
ELSE NULL END AS Val
FROM fact_table;
Then to filter out the NULL values, wrap the query in a subquery and select from it. If you need a hand doing that, give me a shout.
Related
I was doing a query with MySQL to save all objects returned, but I'd like identify these objects based in statements of the block WHERE, that is, if determined object to satisfy the specific characteristic I'd like create one column and in this column I assignment the value 0 or 1 in the row corresponding the object if it satisfy or not satisfy these characteristic.
This is my script:
SELECT
s.id, al.ID, al.j, al.k, al.r, gal.i
FROM
datas as al
WHERE
AND s.id = al.ID
AND al.j between 1 and 1
AND al.k BETWEEN 15 and 16
AND al.r BETWEEN 67 and 72
The script above is working perfectly and I can to save all objects which it return.
So, I'd like to know if is there a way add in the query above, on block WHERE, the following statement,
( Flags & (dbo.environment('cool') +
dbo.environment('ok') -
dbo.environment('source')) ) = 25
and ((al_pp x al_pp1)-0.5/3=11
and determined the objects that satisfy or not these condition with 0 or 1 in a new column created in Table saved.
I read some tutorials about this and saw some attempts with IF, CASE, ADD COLUMN or WHEN, but none of these solved.
Thanks in advance
MySQL has if function, see here
So you can simply use it in your query:
SELECT IF(( Flags & (dbo.fPhotoFlags('SATURATED') +
dbo.fPhotoFlags('BRIGHT') +
dbo.fPhotoFlags('EDGE')) ) = 0
and petroRad_r < 18
and ((colc_u - colc_g) - (psfMag_u - psfMag_g)) < -0.4
, 1 --// VALUE IF TRUE
, 0 --// VALUE IF FALSE
) as conditional_column, ... rest of your query
Here is a simple mdx query to MS OLAP cube, which outputs sale step stats for 3 cities with ranking of each sale stage, it works fine:
WITH
MEMBER [Measures].[rank] AS
case [Sales_step].currentmember.member_caption
when 'Contacts' then 1
when 'Clients' then 2
when 'Funded' then 3
else 0 end
SELECT {[Measures].[rank],
[Measures].[qnt]} ON COLUMNS,
NON EMPTY
crossjoin({[City].CHILDREN},
{[Sales_step].CHILDREN}) ON ROWS
FROM ( SELECT ( STRTOSET(#[Sales_step], CONSTRAINED) ) ON COLUMNS
FROM [SALES_PIPE])
The output is:
Now I want to build totals for each city without separate sale steps, but showing maximum archived sales stage rank only. The result must be:
I tried the following code to do that:
WITH
MEMBER [Measures].[rank max] AS
case [Sales_step].currentmember.member_caption
when 'Contacts' then 1
when 'Clients' then 2
when 'Funded' then 3
else 0 end
SELECT {[Measures].[rank max],
[Measures].[qnt]} ON COLUMNS,
NON EMPTY [City].CHILDREN ON ROWS
FROM ( SELECT ( STRTOSET(#[Sales_step], CONSTRAINED) ) ON COLUMNS
FROM [SALES_PIPE])
It does not generate error, but returns null values for calculated member [Measures].[rank max]:
It works only when I pass one value to #[Sales_step] parameter. While I need a multivalued param run. When I changed this snippet in case clause:
case [Sales_step].currentmember.member_caption
to:
case strtomember(#[Sales_step]).member_caption
it throwed an error "The STRTOMEMBER function expects a member expression for the 1 argument. A tuple set expression was used". Errors fire both for one- and multy-param when I use this too:
case strtoset(#[Sales_step]).currentmember.member_caption
How do I need to modify calculated member [Measures].[rank max] to get desired result with maximum rank for passed #[Sales_step] multivalue param?
I wonder if something like this works:
WITH
SET [S] AS
NonEmpty(
EXISTING [Sales_step].CHILDREN,
([City].CURRENTMEMBER, [Measures].[qnt]) //<<I think that [City].CURRENTMEMBER is probably redundant in this tuple
)
MEMBER [Mx] AS
CASE
WHEN INTERSECT([S], {[Sales_step].[Funded]}).COUNT = 1 THEN 3
WHEN INTERSECT([S], {[Sales_step].[Clients]}).COUNT = 1 THEN 2
WHEN INTERSECT([S], {[Sales_step].[Contacts]}).COUNT = 1 THEN 1
END
...
...
What SQLite statement do I need to get the column name WHERE there is a value?
COLUMN NAME: ALPHA BRAVO CHARLIE DELTA ECHO
ROW VALUE: 0 1 0 1 1
All I want in my return is: Bravo, Delta, Echo.
Your request is not entirely clear, but you appear to be asking for a SELECT statement that will return not data but rather columns names, and not a predictable number of values but rather a number values that depend on the data in the table.
For instance,
A B C D E
0 1 0 1 1
would return (B,D,E) whereas
A B C D E
1 0 1 0 0
would return (A, C).
If that's what you're asking, this is not something that SQL does. SQL retrieves data from the table and an SQL result set always has the same number of columns per row.
To accomplish your goal, you would have to retrieve all columns that might have a value in the table and then, in your program code, check for the value in each column and accrue a list of column names that had values.
Also, consider what happens when there is more than one row to examine and the distribution of values differ. In other words, what's the expected result if the data looks like this:
A B C D E
- - - - -
0 1 0 1 1
1 0 1 0 0
[Also, note that all the columns in your example have values, some 0, some 1. What you really want is a list of column names where the column contains a value of 1.]
Finally, consider that your inability to easily get the results you need from your data might indicate a flaw in the data model you're using. For instance, if you were to structure your data like this:
TagName TagValue
------- --------
Alpha 0
Bravo 1
Charlie 0
Delta 1
Echo 1
you could then obtain your results with SELECT TagName FROM Tags WHERE TagValue = 1.
Furthermore, if 0 and 1 are really the only two possible values (indicating boolean "presence" or "absence" of the tag) then you could remove the TagValue column and the rows for Alpha and Charlie entirely (you'd INSERT a row into the table to add tag and DELETE a row to remove it).
A design along these lines seems to model your data more accurately and allows you to entire new tags to the system without having to issue an ALTER TABLE command.
http://sqlfiddle.com/#!9/1407e/1
SELECT CONCAT(IF(ALPHA,'ALPHA,',''),
IF(BRAVO,'BRAVO,',''),
IF(CHARLIE,'CHARLIE,',''),
IF(DELTA,'DELTA,',''),
IF(ECHO,'ECHO',''))
FROM table1
Picture a table with fields (Id, Valid, Value)
Valid = boolean
Value = number from 0 to 100
What I want is a report that counts the number of records where (valid = 0), and then gives me the total number of cases where (value < 70) and the number of cases where (value >= 70).
The problem is that the "value" field could be empty on some of the records and I only want the records where the value field is not empty.
I know that the second value (value>=70) is going to be calculated, but the problem is that I can't simply do (total number of records - number of records where value < 70), because there's the problem with the records where "value" is null...
And then I want to create graphic with these values, to see the percentage of records below and above 70.
"The problem is that the "value" field could be empty on some of the records and I only want the records where the value field is not empty."
Use a WHERE clause to exclude rows whose "value" field is Null.
Here is sample data for tblMetraton.
Id Valid Valu
1 -1 2
2 0 4
3 -1 6
4 0
5 0 90
I used Valu for the field name because Value is a reserved word.
You can use the Count() function in a query and take advantage of the fact it only counts non-Null values. So use Count() with an IIf() expression which returns a non-Null value (I used 1) for the condition you want to match and Null otherwise.
SELECT
Count(IIf(Valid=0,1,Null)) AS valid_false,
Count(IIf(Valu<70,1,Null)) AS below_70,
Count(IIf([Valu]>=70,1,Null)) AS at_least70
FROM tblMetraton AS m
WHERE (((m.[Valu]) Is Not Null));
Based on my sample data in tblMetraton, that query gives me this result set.
valid_false below_70 at_least70
2 3 1
If my sample data does not address the variability you're dealing with, and/or if my query results don't match your requirements, please show us you own sample data and the results you want based on that sample.
I have a table called ORDEREXECUTIONS that stores all orders that have been executed. It's a multi currency application hence the table has two columns CURRENCY1_ID and CURRENCY2_ID.
To get a list of all orders for a specific currency pair (e.g. EUR/USD) I need to lines to get the totals:
v = Orderexecution.where("is_master=1 and currency1_id=? and currency2_id=? and created_at>=?",c1,c2,Time.now()-24.hours).sum("quantity").to_d
v+= Orderexecution.where("is_master=1 and currency1_id=? and currency2_id=? and created_at>=?",c2,c1,Time.now()-24.hours).sum("unitprice*quantity").to_d
Note that my SUM() formula is different depending on the the sequence of the currencies.
e.g. If I want the total ordered quantities of the currency pair USD it then executes (assuming currency ID for USD is 1 and EUR is 2.
v = Orderexecution.where("is_master=1 and currency1_id=? and currency2_id=? and created_at>=?",1,2,Time.now()-24.hours).sum("quantity").to_d
v+= Orderexecution.where("is_master=1 and currency1_id=? and currency2_id=? and created_at>=?",2,1,Time.now()-24.hours).sum("unitprice*quantity").to_d
How do I write this in RoR so that it triggers only one single SQL statement to MySQL?
I guess this would do:
v = Orderexecution.where("is_master=1
and ( (currency1_id, currency2_id) = (?,?)
or (currency1_id, currency2_id) = (?,?)
)
and created_at>=?"
,c1, c2, c2, c1, Time.now()-24.hours
)
.sum("CASE WHEN currency1_id=?
THEN quantity
ELSE unitprice*quantity
END"
,c1
)
.to_d
So you could do
SELECT SUM(IF(currency1_id = 1 and currency2_id = 2, quantity,0)) as quantity,
SUM(IF(currency2_id = 1 and currency1_id = 2, unitprice * quantity,0)) as unitprice _quantity from order_expressions
WHERE created_at > ? and (currency1_id = 1 or currency1_id = 2)
If you plug that into find_by_sql you should get one object back, with 2 attributes, quantity and unitprice_quantity (they won't show up in the output of inspect in the console but they should be there if you inspect the attributes hash or call the accessor methods directly)
But depending on your indexes that might actually be slower because it might not be able to use indexes as efficiently. The seemly redundant condition on currency1_id means that this would be able to use an index on [currency1_id, created_at]. Do benchmark before and after - sometimes 2 fast queries are better than one slow one!