I am just starting to use MySQL to handle data that is currently in R dataframe objects. I was hoping for a simple round-trip to and from SQL that would recreate an R dataframe exactly:
library("compare",pos=2)
library("RMySQL",pos=2)
conR <- dbConnect(MySQL(),
user = '...',
password = '...',
host = '...',
dbname='r2014')
a3 <- data.frame(x=5:1,y=letters[1:5],z=ordered(c("NEVER","ALWAYS","NEVER","SOMETIMES","NEVER"),levels=c("NEVER","SOMETIMES","ALWAYS")))
a3
dbWriteTable(conn = conR, name = 'a3', value = a3)
a4 <- dbReadTable(conn = conR, name = 'a3')
compare(a3,a4)$detailedResult
a3$z
a4$z
the result shows that factors end up as strings (columns y and z), and that the ordering information for ordered factors is lost (column z):
> a3
x y z
1 5 a NEVER
2 4 b ALWAYS
3 3 c NEVER
4 2 d SOMETIMES
5 1 e NEVER
> compare(a3,a4)$detailedResult
x y z
TRUE FALSE FALSE
> a3$z
[1] NEVER ALWAYS NEVER SOMETIMES NEVER
Levels: NEVER < SOMETIMES < ALWAYS
> a4$z
[1] "NEVER" "ALWAYS" "NEVER" "SOMETIMES" "NEVER"
> a3$y
[1] a b c d e
Levels: a b c d e
> a4$y
[1] "a" "b" "c" "d" "e"
Is there some way to specify the information in the ordered factors in the creation of the table a3 in the database?
I would change the code to:
dbWriteTable(conn = conR, name = 'a3', value = a3, row.names=TRUE)
a4 <- dbReadTable(conn = conR, name = 'a3', row.names=TRUE)
row.names of a data.frame are ordered by default. When they are stored in an SQL column they are also ordered. The SELECT query can use ORDER BY row_names to fetch the ordered set.
Value of row.names in dbReadTable() argument can be changed to NA in case the SQL table does not contain the row_names column.[2]
[1] REF: DBI::dbWriteTable
The interpretation of rownames depends on the ‘row.names’
argument, see ‘sqlRownamesToColumn()’ for details:
• If ‘FALSE’ or ‘NULL’, row names are ignored.
• If ‘TRUE’, row names are converted to a column named
"row_names", even if the input data frame only has natural
row names from 1 to ‘nrow(...)’.
• If ‘NA’, a column named "row_names" is created if the data
has custom row names, no extra column is created in the case
of natural row names.
• If a string, this specifies the name of the column in the
remote table that contains the row names, even if the input
data frame only has natural row names.
[2] REF: DBI::dbReadTable
The presence of rownames depends on the ‘row.names’ argument, see
‘sqlColumnToRownames()’ for details:
• If ‘FALSE’ or ‘NULL’, the returned data frame doesn't have
row names.
• If ‘TRUE’, a column named "row_names" is converted to row
names.
• If ‘NA’, a column named "row_names" is converted to row names
if it exists, otherwise no translation occurs.
• If a string, this specifies the name of the column in the
remote table that contains the row names.
Related
How can I count or read only the actual entries of a column, as distinct from non-zero entries?
In other words, if I have the file:
4000,1,5221,0
4001,0,5222,1
4002,3,,,
column 4 has 2 actual entries, whereof one vanishes. I can count entries like so:
R = csvread("bugtest.csv");
for i = 1:4
VanishingColEntries(i) = numel (find (R(:,i) ==0));
NonVanishingColEntries(i) = nnz(R(:,i));
endfor
VanishingColEntries
NonVanishingColEntries
yielding:
octave:2> nument
VanishingColEntries =
0 1 1 2
NonVanishingColEntries =
3 2 2 1
But, I dont know how to extract the number of "actual" entries, that is the sum of non zero and explicitly zero entries!
csvread is only for numeric data. If csvread encounters an entry which is not strictly numeric, it checks if the string starts with a number, and uses that as the result (e.g. 1direction, 2pac, 7up will result in 1,2,7 ). 'Empty' entries here are effectively considered to be an empty string, which is parsed as the number 0. However, there are some special strings, like nan and inf which are parsed specially.
If you can / are happy to preprocess your csv file, then you can replace all empty entries with the string nan (without quotes). csvread will then treat this string specially and replace it with an actual nan value in the resulting numerical matrix. You can then use this with isnan to count the number of nan / non-nan entries as follows:
R = csvread( 'bugtest.csv' );
% Count nan / non-nan entries along rows
VanishingColEntries = sum( isnan( R ), 1 )
NonVanishingColEntries = sum( ~isnan( R ), 1 )
If you do not have the luxury of preprocessing your csv file (or you simply want to process it programmatically throughout, without the need for human intervention), then you can use the csv2cell function from the io package instead, and process the resulting cell to get what you want, e.g.
pkg load io
C = csv2cell( 'bugtest.csv' )
% Convert cells with empty strings to nan
for i = 1 : numel(C), if ischar(C{i}), C{i} = nan; endif, endfor
% Convert numeric cell array (nan is a valid number) to a matrix
R = cell2mat( C );
You can then use isnan as before to get your result.
I Have a list of My SQL files with the following names. These are located in a folder whose path is reportconnection (reportconn)
TableName
A1_1
A1_2
A1_3
A1_4
A1_5
A1_6
A1_7
A1_8
Each of these tables consists of data regarding 1 e mail campaign blast.
The structure of each of these is as follows. There are 8 such tables, one for each e mail campaign
C1 C2 C3
Y X Z
Y2 X2 Z2
I want a list of unique counts of C2 for each A1, A2, A3 etc.
I have used the following code
C2count<-list()
For (I in(Tablenames){
sql2 <- paste("select count(DISTINCT BINARY C2) from ", TableName)## SQL
Query
C2count<-rbind(C2count,dbGetQuery(reportconn, sql2).}
I am getting just a single list of values. Please help me.
Your sql2 is pasting in "Tablenames" instead of I. I is looping through each name in your list of Tablenames. I is what is changing each time. Hope this helps.
` C2count<-list()
For (I in Tablenames){
sql2 <- paste("select count(DISTINCT BINARY C2) from ", I)## SQLQuery
C2count<-rbind(C2count,dbGetQuery(reportconn, sql2)
}`
I have a simple COUNTIF task in Excel that is proving rather difficult to replicate in Tableau...
This is the data:
ID Metric Scope DynamicCalc
1 A1 TRUE X
1 B1 FALSE X
2 B1 TRUE X
2 A1 FALSE X
2 C1 FALSE X
The column 'DynamicCalc' should have the following values when Metric=A1 is selected: TRUE,TRUE,FALSE,FALSE,FALSE but if say B1 is selected it would be FALSE,FALSE,TRUE,TRUE,TRUE... so basically I want to assign a value of TRUE to the DynamicColumn if there is at least one TRUE in the Scope column to all rows for that ID.
The LOD Expression can be used to retrieve your desired result
try using a calculated field like below :
{FIXED [ID],[Metric]:MAX(if [Scope]='TRUE' then 'True' else 'False' end)}
When the Selection is B1 :
I know this is late, but as SO community (bot) has made it active again, I propose a slightly different approach. The selection should be through parameter.
After making parameter on Metric field, create a calculated field say Dynamic calc like this
{FIXED ID : MAX({FIXED ID, [Metric]: MAX(If [Metric] = [Metric Parameter] THEN [Scope] END)})}
Add this field and your desired view is complete. See GIF below
Here's my problem - Midstream in my data flow, we have some values in one column that we want to swap for other values based on a lookup table.
For example, if I had a rowset like this:
Key Value
1 A
2 B
3 A
4 C
5 D
6 B
... ...
If I had a lookup table in a SQL Server DB that looked like this:
Value1 Value2
C Y
D Z
Then I would want my package to swap only those values so the resulting data flow would look like this:
Key Value
1 A
2 B
3 A
4 Y
5 Z
6 B
... ...
What components would produce the simplest solution?
You could use a lookup component and then:
Set it up to Ignore Failure
Values that do not match will return null for the lookup value
Use a derived column expression to populate where the lookup succeeded
ISNULL(Value2) ? Value : Value2
What SQLite statement do I need to get the column name WHERE there is a value?
COLUMN NAME: ALPHA BRAVO CHARLIE DELTA ECHO
ROW VALUE: 0 1 0 1 1
All I want in my return is: Bravo, Delta, Echo.
Your request is not entirely clear, but you appear to be asking for a SELECT statement that will return not data but rather columns names, and not a predictable number of values but rather a number values that depend on the data in the table.
For instance,
A B C D E
0 1 0 1 1
would return (B,D,E) whereas
A B C D E
1 0 1 0 0
would return (A, C).
If that's what you're asking, this is not something that SQL does. SQL retrieves data from the table and an SQL result set always has the same number of columns per row.
To accomplish your goal, you would have to retrieve all columns that might have a value in the table and then, in your program code, check for the value in each column and accrue a list of column names that had values.
Also, consider what happens when there is more than one row to examine and the distribution of values differ. In other words, what's the expected result if the data looks like this:
A B C D E
- - - - -
0 1 0 1 1
1 0 1 0 0
[Also, note that all the columns in your example have values, some 0, some 1. What you really want is a list of column names where the column contains a value of 1.]
Finally, consider that your inability to easily get the results you need from your data might indicate a flaw in the data model you're using. For instance, if you were to structure your data like this:
TagName TagValue
------- --------
Alpha 0
Bravo 1
Charlie 0
Delta 1
Echo 1
you could then obtain your results with SELECT TagName FROM Tags WHERE TagValue = 1.
Furthermore, if 0 and 1 are really the only two possible values (indicating boolean "presence" or "absence" of the tag) then you could remove the TagValue column and the rows for Alpha and Charlie entirely (you'd INSERT a row into the table to add tag and DELETE a row to remove it).
A design along these lines seems to model your data more accurately and allows you to entire new tags to the system without having to issue an ALTER TABLE command.
http://sqlfiddle.com/#!9/1407e/1
SELECT CONCAT(IF(ALPHA,'ALPHA,',''),
IF(BRAVO,'BRAVO,',''),
IF(CHARLIE,'CHARLIE,',''),
IF(DELTA,'DELTA,',''),
IF(ECHO,'ECHO',''))
FROM table1