How to get the NA rate by column in Tableau Desktop? - duplicates

I try to get a simple thing with Tableau, the % of null value by column of my dataset.
But each time I put my dimensions on my columns it displays all of possible values of this dimension, it's impossible to make the Python's equivalent of dataframe.isna().sum()/len(dataframe).

By default Tableau visualizes all possible values.
Assuming you have this kind of data:
ID Category
1 A
2 B
3 C
4 D
5 E
6 F
7
8
9
10
You can get your Rate with a simple Calculated field:

Related

How to perform a many-to-many or (at least) a outer-join in SPSS

usually I use [R] for my data analysis, but these days I have to use SPSS. I was expecting that data manipulation might get a little bit more difficult this way, but after my first day I kind of surrender :D and I really would appreciate some help ...
My problem is the following:
I have two data sets, which have an ID number. Neither data sets have a unique ID (in one data set, which should have unique IDs, there is kind of a duplicated row)
In a perfect world I would like to keep this duplicated row and simply perform a many-to-many-join. But I accepted, that I might have to delete this "bad" row (in dataset A) and perform a 1:many-join (join dataset B to dataset A, which contains the unique IDs).
If I run the join (and accept that it seems not to be possible to run a 1:many, but only a many:1-join), I have the problem, that I lose IDs. If I join dataset A to dataset B I lose all cases, that are not part of dataset B. But I really would like to have both IDs like in a full join or something.
Do you know if there is (kind of) a simple solution to my problem?
Example:
dataset A:
ID
VAL1
1
A
1
B
2
D
3
K
4
A
dataset B:
ID
VAL2
1
g
2
k
4
a
5
c
5
d
5
a
2
x
expected result (best solution):
ID
VAL1
VAL2
1
A
g
1
B
g
2
D
k
3
K
NA
4
A
a
2
D
x
expected result (second best solution):
ID
VAL1
VAL2
1
A
g
2
D
k
3
K
NA
4
A
a
5
NA
c
5
NA
d
5
NA
a
2
D
x
what I get (worst solution):
ID
VAL1
VAL2
1
A
g
2
D
k
4
A
a
5
NA
c
5
NA
d
5
NA
a
2
D
x
From your example It looks like what you need is a full many to many join, based on the ID's existing in dataset A. You can get this by creating a full Cartesian-Product of the two dataset, using dataset A as the first\left dataset.
The following syntax assumes you have the STATS CARTPROD extention command installed. If you don't you can see here about installing it.
First I'll recreate your example to demonstrate on:
dataset close all.
data list list/id1 vl1 (2F3) .
begin data
1 232
1 433
2 456
3 246
4 468
end data.
dataset name aaa.
data list list/id2 vl2 (2F3) .
begin data
1 111
2 222
4 333
5 444
5 555
5 666
2 777
3 888
end data.
dataset name bbb.
Now the actual work is fairly simple:
DATASET ACTIVATE aaa.
STATS CARTPROD VAR1=id1 vl1 INPUT2=bbb VAR2=id2 vl2
/SAVE OUTFILE="C:\somepath\yourcartesianproduct.sav".
* The new dataset now contains all possible combinations of rows in the two datasets.
* we will select only the relevant combinations, where the two ID's match.
select if id1=id2.
exe.

How to compare values from stored in the same table with qlikview?

being new to qlikview Im a litle confused with I should do in sql and what qlik provides out of the box.
Lets suppose I have a table similar to this :
id Status type value quantity dat_s Area
1 Activo A 10 10 20171001 Norte
2 Activo B 20 20 20171001 Norte
3 Activo C 15 15 20171001 Sul
4 Fechado A 5 5 20171101 Norte
5 Activo B 20 20 20171101 Norte
6 Activo D 5 5 20171101 Sul
7 Activo D 5 5 20170901 Sul
Id like to compare a table with itself, but only the likes from selected dates, lets imagine, data A = 20171001 and date B= 20171001 (these should be user defined via an input field or whatever) the comparison id like to do is for example :
Type CountDateA ValDateA CountDateB ValDateB valuediff
A 1 100 1 25 -75
B 1 400 1 400 0
C 1 225 0 0 -225
D 0 0 1 25 25
or
Area ValDateA ValDateB valuediff
Norte 500 425 -75
Sul 225 25 -200
I was planing to duplicate the table and use different field names for the same data leaving half empty but I hope there is a more elegant way
Thanks all.
just needed to load the table and then the calculations of the clumns would be :
Sum(< Status ={$('Activo')}, dat ={$(20171001)} qty*val)
Still quite confused with your problem. Qlikview's power relies (in few words) on building graphs or tables that are automatically updated depending on selected filters. In your example, I guess, that filter would be the date (or dates) the user selects. Hence, you wouldn't need to define columns like ValDateA, ValDateB etc. In your case however, it seems that you want to compare EXACTLY two dates, so you could define those columns, each of them depending on different date pickers. This being said, I'll show you how I would approach your problem although I'm not really sure whether I understood well:
I assume you read your data correctly so you have the first data table on memory (with the fields: id Status type value quantity dat_s Area)(note: be careful and consistent with capital letters)
Create a table chart with dimension "type" (which will autofilter each row expression) and with these expressions:
Count(distinct{< Status = {"Activo"}, date_s= {"$(vDate1)"} >} id) //how many rows in active state for date1 (vDate1 is a variable assigned to the first date picker)
Sum({< Status = {"Activo"}, date_s= {"$(vDate1)"} >} value*quantity)
Same as expression 1 but using $(vDate2)
Same as expression 2 but using $(vDate2)
In Qlikview you can just write Column(4) - Column(2), in QlikSense you would need to write the whole expressions 2 and 4 again and subtract the sums.

How to add dynamic range to database (store the ranges in a table)

Table (CostTitle)
Id_ _costTitle_
1 A
2 B
3 C
4 D
5 E
6 F
A Refers numbers between 0-99
B Refers numbers between 100-199
C Refers numbers between 200-299
D Refers numbers between 300-399
E Refers numbers between 400-499
F Refers numbers between 500-599
costCode will be base on costTitle's refers numbers
Table (CostCode)
Id_ _costTitle_ _costCode_ _costProductTitle_
1 A 12 productX
2 B 111 productY
3 B 142 productZ
4 C 201 productK
5 F 511 productL
6 F 582 productM
I am trying to add product and assign dynamically cost code.
Thanks for advance
I suppose you want to store the ranges in a table. So you need a BEFORE INSERT trigger, which sets new.costTitle. Triggers are explained here:
http://dev.mysql.com/doc/refman/5.6/en/create-trigger.html
MariaDB offers an alternative: dynamic columns. However, because of the limits of this feature, you cannot store the ranges in a different table. You will need to hardcode the ranges in the virtual column definition, which doesn't seem to me a great idea (but you decide, of course).
https://mariadb.com/kb/en/mariadb/virtual-computed-columns/

KDB: apply dyadic function across two lists

Consider a function F[x;y] that generates a table. I also have two lists; xList:[x1;x2;x3] and yList:[y1;y2;y3]. What is the best way to do a simple comma join of F[x1;y1],F[x1;y2],F[x1;y3],F[x2;y1],..., thereby producing one large table?
You have asked for the cross product of your argument lists, so the correct answer is
raze F ./: xList cross yList
Depending on what you are doing, you might want to look into having your function operate on the entire list of x and the entire list of y and return a table, rather than on each pair and then return a list of tables which has to get razed. The performance impact can be considerable, for example see below
q)g:{x?y} //your core operation
q)//this takes each pair of x,y, performs an operation and returns a table for each
q)//which must then be flattened with raze
q)fm:{flip `x`y`res!(x;y; enlist g[x;y])}
q)//this takes all x, y at once and returns one table
q)f:{flip `x`y`res!(x;y;g'[x;y])}
q)//let's set a seed to compare answers
q)\S 1
q)\ts do[10000;rm:raze fm'[x;y]]
76 2400j
q)\S 1
q)\ts do[10000;r:f[x;y]]
22 2176j
q)rm~r
1b
Setup our example
q)f:{([] total:enlist x+y; x:enlist x; y:enlist y)}
q)x:1 2 3
q)y:4 5 6
Demonstrate F[x1;y1]
q)f[1;4]
total x y
---------
5 1 4
q)f[2;5]
total x y
---------
7 2 5
Use the multi-valent apply operator together with each' to apply to each pair of arguments.
q)raze .'[f;flip (x;y)]
total x y
---------
5 1 4
7 2 5
9 3 6
Another way to achieve it using each-both :
x: 1 2 3
y: 4 5 6
f:{x+y}
f2:{ a:flip x cross y ; f'[a 0;a 1] }
f2[x;y]
5j, 6j, 7j, 6j, 7j, 8j, 7j, 8j, 9j

How to apply a formula for removing data noise in R?

I am working on NGSim Traffic data, having 18 columns and 1180598 rows in a text file. I want to smooth the position data, in the column 'Local Y'. I know there are built-in functions for data smoothing in R but none of them seem to match with the formula I am required to apply. The data in text file looks something like this:
Index VehicleID Total_Frames Local Y
1 2 5 35.381
2 2 5 39.381
3 2 5 43.381
4 2 5 47.38
5 2 5 51.381
6 4 8 504.828
7 4 8 508.325
8 4 8 512.841
9 4 8 516.338
10 4 8 520.854
11 4 8 524.592
12 4 8 528.682
13 4 8 532.901
14 5 7 39.154
15 5 7 43.153
16 5 7 47.154
17 5 7 51.154
18 5 7 55.153
19 5 7 59.154
20 5 7 63.154
The above data columns are just example taken out of original file. Here you can see 3 vehicles, with vehicle IDs = 2, 4 and 5 but in fact there are 2169 vehicles with different IDS. The column Total_Frames tell us how many times vehicle Id of each vehicle is repeated in the first column, for example in the table above, vehicle ID 2 is repeated 5 times, hence '5' in Total_Frames column. Following is the formula I am required to apply to remove data noise (smoothing) from column 'Local Y':
Smoothed Position Value = (1/(Summation of [EXP^-abs(i-k)/delta] from k=i-D to i+D)) * ( (Summation of (Local Y) *[EXP^-abs(i-k)/delta] from k=i-D to i+D))
where,
i = index #
delta = 5
D = 15
I have tried using the built-in functions, which I know of, but they don't smooth the data as required. My question is: Is there any built-in function in R which can do the data smoothing in the way of given formula or which could take this formula as an argument? I need to apply the formula to every value in Local Y which has 15 values before and 15 values after them (i-D and i+D) for same vehicle Id. Can anyone give me any idea how to approach the problem? Thanks in advance.
You can place your formula in a function and then use the apply function of R to apply it to the elements in your "Local Y" column of the dataframe