What is the color palette used by igraph? - igraph

My reproducible example is the following:
get.vertex.attribute(g)
$name
[1] "LV" "Ve" "Ca" "Ai" "BN" "EN" "Or" "So" "SG" "Bo" "AX" "Sa" "To" "Pe" "Da" "He" "VI" "Ke" "Va" "At" "Ac" "Mi"
[23] "Cr" "Le" "Pu" "Re" "Te" "C." "N." "Y." "M." "D." "F." "L." "P." "S." "B." "J." "I." "A." "H." "R." "E." "O."
$color
[1] 1 1 1 1 1 2 3 1 1 3 1 3 3 3 1 4 3 5 3 1 1 6 2 6 1 3 3 1 1 1 1 3 1 2 3 1 5 1 2 3 3 4 3 6
In my case, the following code:
library("igraph")
vertices<-data.frame("name" = unique(unlist(relations)))
g = graph.data.frame(relations, directed=F, vertices=vertices)
vertices$group = edge.betweenness.community(g)$membership
V(g)$color <- vertices$group
plot(g,layout=layout.auto,vertex.size=6, vertex.label.cex = 0.8)
gives this graph:
where the color 1 seems to be orange, 2 is light blue, etc...
yet
palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" "gray"
>
So what is the color palette used by igraph?
I am curious because I would like to use it in another package that only takes names of colors as input and doesn't seem to recognize the V(g)$color vector as a candidate for input (ie outputs only black).

The short answer is categorical_pal(8).
Full Story
If you look at the help page ?igraph.plotting and search on palette you will find.
palette
The color palette to use for vertex color. The default is
categorical_pal, which is a color-blind friendly categorical palette.
See its manual page for details and other palettes.
The help page ?categorical_pal says:
This is a color blind friendly palette from
http://jfly.iam.u-tokyo.ac.jp/color. It has 8 colors.
We can make a quick demonstration of this.
library(igraph)
x = 1:8
y = rep(1,8)
plot(x,y, pch=20, cex=10, col=categorical_pal(8), xlim=c(0.5,8.5))

Related

Undefined columns selected using panelvar package

Have anyone used panel var in R?
Currently I'm using the package panelvar of R. And I'm getting this error :
Error in `[.data.frame`(data, , c(colnames(data)[panel_identifier], required_vars)) :
undefined columns selected
And my syntax currently is:
model1<-pvargmm(
dependent_vars = c("Change.."),
lags = 2,
exog_vars = c("Price"),
transformation = "fd",
data = base1,
panel_identifier = c("id", "t"),
steps = c("twostep"),
system_instruments = FALSE,
max_instr_dependent_vars = 99,
min_instr_dependent_vars = 2L,
collapse = FALSE)
I don't know why my panel_identifier is not working, it's pretty similar to the example given by panelvar package, however, it doesn't work, I want to appoint that base1 is on data.frame format. any ideas? Also, my data is structured like this:
head(base1)
id t country DDMMYY month month_text day Date_txt year Price Open
1 1 1296 China 1-4-2020 4 Apr 1 Apr 01 2020 12588.24 12614.82
2 1 1295 China 31-3-2020 3 Mar 31 Mar 31 2020 12614.82 12597.61
High Low Vol. Change..
1 12775.83 12570.32 NA -0.0021
2 12737.28 12583.05 NA 0.0014
thanks in advance !
Check the documentation of the package and the SSRN paper. For me it helped to ensure all entered formats are identical (you can check this with str(base1) command). For example they write:
library(panelvar)
data("Dahlberg")
ex1_dahlberg_data <-
pvargmm(dependent_vars = .......
When I look at it I get
>str(Dahlberg)
'data.frame': 2385 obs. of 5 variables:
$ id : Factor w/ 265 levels "114","115","120",..: 1 1 1 1 1 1 1 1 1 2 ...
$ year : Factor w/ 9 levels "1979","1980",..: 1 2 3 4 5 6 7 8 9 1 ...
$ expenditures: num 0.023 0.0266 0.0273 0.0289 0.0226 ...
$ revenues : num 0.0182 0.0209 0.0211 0.0234 0.018 ...
$ grants : num 0.00544 0.00573 0.00566 0.00589 0.00559 ...
For example the input data must be a data.frame (in my case it had additional type specifications like tibble or data.table). I resolved it by casting as.data.frame() on it.

Alternative to extract function when working with raster objects

I wonder how to sum pixel values of a raster (val_r) for each categories of another raster (cat_r). In other words, does an alternative to the function "extract" exist when working with raster objects? Thank you very much!
# sample raster with categories
cat_r<-raster(ncol=3,nrow=3, xmn=-10, xmx=10, ymn=-10, ymx=10)
cat_r[]<-c(1,2,1,3,4,3,4,4,4 ) #4 categories: 1, 2, 3 and 4
#sample raster with pixel values
val_r <-raster(ncol=3,nrow=3, xmn=-10, xmx=10, ymn=-10, ymx=10)
val_r[]<-c(1,0,1,5,2,5,2,2,2)
#extract function doesn't work for
extract(val_r, cat_r, fun=sum)
#I should find the following values: category 1: 2, cat 2: 0, cat 3: 10, cat 4: 8
You can use the zonal method:
library(raster)
cat_r <- raster(ncol=3,nrow=3, xmn=-10, xmx=10, ymn=-10, ymx=10, vals=c(1,2,1,3,4,3,4,4,4 ))
val_r <- setValues(cat_r, c(1,0,1,5,2,5,2,2,2))
zonal(val_r, cat_r, "sum")
# zone sum
#[1,] 1 2
#[2,] 2 0
#[3,] 3 10
#[4,] 4 8
This is equivalent to
s <- stack(cat_r, val_r)
v <- values(s)
tapply(v[,2], v[,1], sum)
# 1 2 3 4
# 2 0 10 8

SQL Query - Conditional Values in a User-defined Column

Hi Stack Overflow Community,
I am researching how to create a query that conditionally assigns values in a user-defined column based upon values in another column. I didn't know if this was entirely possible, as I couldn't find any references on this. I know that it's possible to create a user-defined column by just entering in something like 'Yellow' As Color, but these are limited to static values.
I have provided an example of the output below, and the end result would be the user-defined column values would be a string.
X(Column from Table) Color(User-Defined Column)
1 if X = 1, Color = 'Brown'
2 if X = 2, Color = 'Blue'
3 if X = 3, Color = 'Red'
4 if X = 4, Color = 'Orange'
5 if X = 5, Color = 'Purple'
X Color
1 Brown
2 Blue
3 Red
4 Orange
5 Purple
Any input would be greatly appreciated, and thank you in advance!
Daniel
For small amount of available values i think case will be most appropriate.
SELECT X,
CASE
WHEN X = 1 THEN "Brown"
WHEN X = 2 THEN "Blue"
WHEN X = 3 THEN "Red"
WHEN X = 4 THEN "Orange"
WHEN X = 5 THEN "Purple"
ELSE "No color"
END AS Color
FROM Table;

how to select/add a column to pandas dataframe based on a non trivial function of other columns

This is a followup question for this one: how to select/add a column to pandas dataframe based on a function of other columns?
have a data frame and I want to select the rows that match some criteria. The criteria is a function of values of other columns and some additional values.
Here is a toy example:
>> df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9],
'B': [randint(1,9) for x in xrange(9)],
'C': [4,10,3,5,4,5,3,7,1]})
>>
A B C
0 1 6 4
1 2 8 10
2 3 8 3
3 4 4 5
4 5 2 4
5 6 1 5
6 7 1 3
7 8 2 7
8 9 8 1
I want select all rows for which some non trivial function returns true, e.g. f(a,c,L), where L is a list of lists and f returns True iff a and c are not part of the same sublist.
That is, if L = [[1,2,3],[4,2,10],[8,7,5,6,9]] I want to get:
A B C
0 1 6 4
3 4 4 5
4 5 2 4
6 7 1 3
8 9 8 1
Thanks!
Here is a VERY VERY hacky and non-elegant solution. As another disclaimer, since your question doesn't state what you want to do if a number in the column is in none of the sub lists this code doesn't handle that in any real way besides any default functionality within isin().
import pandas as pd
df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9],
'B': [6,8,8,4,2,1,1,2,8],
'C': [4,10,3,5,4,5,3,7,1]})
L = [[1,2,3],[4,2,10],[8,7,5,6,9]]
df['passed1'] = df['A'].isin(L[0])
df['passed2'] = df['C'].isin(L[0])
df['1&2'] = (df['passed1'] ^ df['passed2'])
df['passed4'] = df['A'].isin(L[1])
df['passed5'] = df['C'].isin(L[1])
df['4&5'] = (df['passed4'] ^ df['passed5'])
df['passed7'] = df['A'].isin(L[2])
df['passed8'] = df['C'].isin(L[2])
df['7&8'] = (df['passed7'] ^ df['passed8'])
df['PASSED'] = df['1&2'] & df['4&5'] ^ df['7&8']
del df['passed1'], df['passed2'], df['1&2'], df['passed4'], df['passed5'], df['4&5'], df['passed7'], df['passed8'], df['7&8']
df = df[df['PASSED'] == True]
del df['PASSED']
With an output that looks like:
A B C
0 1 6 4
3 4 4 5
4 5 2 4
6 7 1 3
8 9 8 1
I implemented this rather quickly hence the utter and complete ugliness of this code, but I believe you can refactor it any way you would like (e.g. iterate over the original set of lists with for sub_list in L, improve variable names, come up with a better solution, etc).
Hope this helps. Oh, and did I mention this was hacky and not very good code? Because it is.

Subsetting in a function to calculate a row total

I have a data frame with results for certain instruments, and I want to create a new column which contains the totals of each row. Because I have different numbers of instruments each time I run an analysis on new data, I need a function to dynamically calculate the new column with the Row Total.
To simply my problem, here’s what my data frame looks like:
Type Value
1 A 10
2 A 15
3 A 20
4 A 25
5 B 30
6 B 40
7 B 50
8 B 60
9 B 70
10 B 80
11 B 90
My goal is to achieve the following:
A B Total
1 10 30 40
2 15 40 55
3 20 50 70
4 25 60 85
5 70 70
6 80 80
7 90 90
I’ve tried various method, but this way holds the most promise:
myList <- list(a = c(10, 15, 20, 25), b = c(30, 40, 50, 60, 70, 80, 90))
tmpDF <- data.frame(sapply(myList, '[', 1:max(sapply(myList, length))))
> tmpDF
a b
1 10 30
2 15 40
3 20 50
4 25 60
5 NA 70
6 NA 80
7 NA 90
totalSum <- rowSums(tmpDF)
totalSum <- data.frame(totalSum)
tmpDF <- cbind(tmpDF, totalSum)
> tmpDF
a b totalSum
1 10 30 40
2 15 40 55
3 20 50 70
4 25 60 85
5 NA 70 NA
6 NA 80 NA
7 NA 90 NA
Even though this way did succeeded in combining two data frames of different lengths, the ‘rowSums’ function gives the wrong values in this example. Besides that, my original data isn't in a list format, so I can't apply such a 'solution'.
I think I’m overcomplicating this problem, so I was wondering how can I …
Subset data from a data frame on the basis of ‘Type’,
Insert these individual subsets of different lengths into a new data frame,
Add an ‘Total’ column to this data frame which is the correct sum of the
individual subsets.
An added complication to this problem is that this needs to be done in an function or in an otherwise dynamic way, so that I don’t need to manually subset the dozens of ‘Types’ (A, B, C, and so on) in my data frame.
Here’s what I have so far, which doesn’t work, but illustrates the lines I’m thinking along:
TotalDf <- function(x){
tmpNumberOfTypes <- c(levels(x$Type))
for( i in tmpNumberOfTypes){
subSetofData <- subset(x, Type = i, select = Value)
if( i == 1) {
totalDf <- subSetOfData }
else{
totalDf <- cbind(totalDf, subSetofData)}
}
return(totalDf)
}
Thanks in advance for any thoughts or ideas on this,
Regards,
EDIT:
Thanks to the comment of Joris (see below) I got an end in the right direction, however, when trying to translate his solution to my data frame, I run into additional problems. His proposed answer works, and gives me the following (correct) sum of the values of A and B:
> tmp78 <- tapply(DF$value,DF$id,sum)
> tmp78
1 2 3 4 5 6
6 8 10 12 9 10
> data.frame(tmp78)
tmp78
1 6
2 8
3 10
4 12
5 9
6 10
However, when I try this solution on my data frame, it doesn’t work:
> subSetOfData <- copyOfTradesList[c(1:3,11:13),c(1,10)]
> subSetOfData
Instrument AccountValue
1 JPM 6997
2 JPM 7261
3 JPM 7545
11 KFT 6992
12 KFT 6944
13 KFT 7069
> unlist(sapply(rle(subSetOfData$Instrument)$lengths,function(x) 1:x))
Error in rle(subSetOfData$Instrument) : 'x' must be an atomic vector
> subSetOfData$InstrumentNumeric <- as.numeric(subSetOfData$Instrument)
> unlist(sapply(rle(subSetOfData$InstrumentNumeric)$lengths,function(x) 1:x))
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
> subSetOfData$id <- unlist(sapply(rle(subSetOfData$InstrumentNumeric)$lengths,function(x) 1:x))
Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 2L, 3L, 1L, 2L, :
replacement has 3 rows, data has 6
I have the disturbing idea that I’m going around in circles…
Two thoughts :
1) you could use na.rm=T in rowSums
2) How do you know which one has to go with which? You might add some indexing.
eg :
DF <- data.frame(
type=c(rep("A",4),rep("B",6)),
value = 1:10,
stringsAsFactors=F
)
DF$id <- unlist(lapply(rle(DF$type)$lengths,function(x) 1:x))
Now this allows you to easily tapply the sum on the original dataframe
tapply(DF$value,DF$id,sum)
And, more importantly, get your dataframe in the correct form :
> DF
type value id
1 A 1 1
2 A 2 2
3 A 3 3
4 A 4 4
5 B 5 1
6 B 6 2
7 B 7 3
8 B 8 4
9 B 9 5
10 B 10 6
> library(reshape)
> cast(DF,id~type)
id A B
1 1 1 5
2 2 2 6
3 3 3 7
4 4 4 8
5 5 NA 9
6 6 NA 10
TV <- data.frame(Type = c("A","A","A","A","B","B","B","B","B","B","B")
, Value = c(10,15,20,25,30,40,50,60,70,80,90)
, stringsAsFactors = FALSE)
# Added Type C for testing
# TV <- data.frame(Type = c("A","A","A","A","B","B","B","B","B","B","B", "C", "C", "C")
# , Value = c(10,15,20,25,30,40,50,60,70,80,90, 100, 150, 130)
# , stringsAsFactors = FALSE)
lnType <- with(TV, tapply(Value, Type, length))
lnType <- as.integer(lnType)
lnType
id <- unlist(mapply(FUN = rep_len, length.out = lnType, x = list(1:max(lnType))))
(TV <- cbind(id, TV))
require(reshape2)
tvWide <- dcast(TV, id ~ Type)
# Alternatively
# tvWide <- reshape(data = TV, direction = "wide", timevar = "Type", ids = c(id, Type))
tvWide <- subset(tvWide, select = -id)
# If you want something neat without the <NA>
# for(i in 1:ncol(tvWide)){
#
# if (is.na(tvWide[j,i])){
# tvWide[j,i] = 0
# }
#
# }
# }
tvWide
transform(tvWide, rowSum=rowSums(tvWide, na.rm = TRUE))