Problems with node names in Igraph with R - igraph

I have a list of data.frames (dfList) from which I'd like to generate directed weighted networks in Igraph in R.
The edge-weight variable is "ValueUSD", while vertices are identified by numbers with the columns "Reporter" and "Partner".
Here it follows a function I prepared called "write.graphs" to be used with lapply on my "dfList".
write.graphs<-function(filename){
d<-graph.data.frame(filename[c("Reporter", "Partner")], directed=TRUE)
d <- set.edge.attribute(d, "weight", value=filename$ValueUSD)
}
graphs<-lapply(dfList, write.graphs)
Every thing work perfectly. If I check for vertex names I get:
graphs[1]$names
NULL
But troubles emerge when I want to use names in place of numbers to identify my vertices, using the corresponding columns "ReporterN" and "PartnerN" in each of my data.frames in dfLIst.
Here you can see how my data.frames look like:
dfList[1]
$`Aug 2014`
Reporter YearPeriod Year Period Commodity Partner NetWeightKg ValueUSD Price PartnerN ReporterN
1 76 201408 2014 8 150910 0 4472917 22028271 4.924811 World Brazil
2 76 201408 2014 8 150910 32 380891 1533948 4.027262 Argentina Brazil
3 76 201408 2014 8 150910 152 239776 1336057 5.572105 Chile Brazil
4 76 201408 2014 8 150910 251 289 2164 7.487889 France Brazil
5 76 201408 2014 8 150910 300 27592 170658 6.185054 Greece Brazil
6
This is the message I get:
> grafi<-lapply(dfList, scrivi.grafi)
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In graph.data.frame(filename[c("ReporterN", "PartnerN")], ... :
In `d' `NA' elements were replaced with string "NA"
2: In graph.data.frame(filename[c("ReporterN", "PartnerN")], ... :
In `d' `NA' elements were replaced with string "NA"
3: In graph.data.frame(filename[c("ReporterN", "PartnerN")], ... :
If it helps, I checked:
class(dfList[1]$PartnerN)
[1] "NULL"
Any suggestion? Can anyone explain me what happens?
Thanks a lot, Umberto.

Related

MySQL - Connect three different tables in transposed fashion

I have three tables:
business:
id name
1 Charlie's Bakery
2 Mark's Pizza
3 Rob's Market
balanco_manual:
id business_id year unit
012 1 2015 ones
123 1 2014 tens
364 2 2014 cents
conta_balanco:
id conta balanco_id valor
412 12.3 012 12324
344 12.5 012 54632
414 14.1 364 344122
789 12 364 2312415
646 12 123 342
I need to combine them all in the business table and make them look like this:
business:
id name 12.3-2015 12.5-2015 11.56-2015 12-2014 2015-unit 2014-unit
1 Charlie's Bakery 12324 54632 NaN 342 ones tens
2 Mark's Pizza NaN NaN NaN 2312415 NaN cents
3 Rob's Market NaN NaN NaN NaN NaN NaN
Explaining a little bit further: the business table has basic registries about the businesses, balanco_manual has yearly information of each one of those businesses and conta_balanco has details of the yearly information in balanco_manual.
Trying to put that last table into words:
- First I need to join business with balanco_manual, combining the "id" column in business with the "business_id" column in balanco_manual. Note that I combine unit and year in one single column named "[year]-unit". Let's call this table "new_business" to make it easir to understand
- After, I need to combine "new_business" with conta_balanco in a similar way we did with the "unit" column. Each "conta" should be combined with the year and become a column "conta-[year]".
I'm quite a beginner with SQL and I'm having interesting difficulties. Could someone help me to crack that out?

Formatting JSON data in R

I'm really new to working with JSON data, so I had a question about formatting.
Here's the link to the data I was trying to work with
I was using JSONlite and did this:
shot<-"http://stats.nba.com/stats/playerdashptshotlog?DateFrom=&DateTo=&
GameSegment=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&
Outcome=&Period=0&PlayerID=202322&Season=2014-15&SeasonSegment=&
SeasonType=Regular+Season&TeamID=0&VsConference=&VsDivision="
I then did fromJSON:
json_data <- fromJSON(paste(readLines(shot), collapse=""))
This gives me the data in a list. My issue (although for all I know I messed up working towards this) is trying to create a data frame out of this info. I was able to make a data frame with code I read under similar questions on the site, but it is all of the data in just one column. Any recommendations would be appreciated!
Thanks
Normally, first thing to do when you get a JSON, you look at the structure.
str(json_data)
Doing so will reveal that your data has a very simple structure: is is a dataframe with rows, a line of headers, wrapped in some metadata about what each column means. Using the $ will allow you to address those specific components. In other words, your specific json is already a data frame structure, all you gotta to is take it out of json
library(jsonlite)
json_data <- fromJSON(paste(readLines(shot), collapse=""))
str(json_data)
mydf <- data.frame(json_data$resultSets$rowSet)
colnames(mydf) <- unlist(json_data$resultSets$headers)
You ought to get something like this:
head(mydf)
GAME_ID MATCHUP LOCATION W FINAL_MARGIN SHOT_NUMBER PERIOD
1 0021401215 APR 14, 2015 - WAS # IND A L -4 1 1
2 0021401215 APR 14, 2015 - WAS # IND A L -4 2 1
3 0021401215 APR 14, 2015 - WAS # IND A L -4 3 1
4 0021401215 APR 14, 2015 - WAS # IND A L -4 4 1
5 0021401215 APR 14, 2015 - WAS # IND A L -4 5 1
6 0021401215 APR 14, 2015 - WAS # IND A L -4 6 1
GAME_CLOCK SHOT_CLOCK DRIBBLES TOUCH_TIME SHOT_DIST PTS_TYPE SHOT_RESULT
1 10:33 7.7 0 1 25 3 missed
2 8:41 14 10 9.6 10.7 2 made
3 6:42 14.9 11 9.7 18.2 2 missed
4 5:16 19 3 3.5 4.2 2 made
5 4:45 19.8 3 3.7 3.3 2 missed
6 3:08 13.5 10 9.7 18 2 missed
CLOSEST_DEFENDER CLOSEST_DEFENDER_PLAYER_ID CLOSE_DEF_DIST FGM PTS
1 Hill, George 201588 4.3 0 0
2 Hill, George 201588 5.7 1 2
3 Hill, George 201588 3 0 0
4 Miles, CJ 101139 4 1 2
5 Hill, Solomon 203524 3 0 0
6 Hill, George 201588 4.5 0 0

strange output after appending a column

I cbind a column "class" to a data frame and got a new tdm1, tdm1<- cbind(tdm1, class), it's all good
the content of class looks like this
1 715
2 715
3 707
4 705
5 704
6 701
7 701
...
Then after cbind, I want to get a look at the class column by using tdm1[,ncol(tdm1)], somehow i got 35 Levels: 156 174 205 250 295 324 335 340 343 345 348 349 361 370 375 381 382 428 439 451 455 701 704 705 706 ... 72 after the correct values for the entire column. it's like a summary of the column value. Idon't know where it came from. this additional information makes my later knn classification weird. how do i get rid of it?
Your object is a factor. Calling ?factor reveals:
factor returns an object of class "factor" which has a set of integer
codes the length of x with a "levels" attribute of mode character and
unique (!anyDuplicated(.))
The levels attribute being printed to your dismay reflects all the unique values contained within the object you are printing. To get rid of it, try:
as.numeric(as.characer(tdm1[,ncol(tdm1)]))

Grab HTML table using XML

I am trying to read an html table using the package XML, but even though it looks easy, I haven’t managed to do it. I tried everything, but the names of the columns are always fixed by R as V1, V2, V3,…
This is the code:
require(XML)
tbl <- readHTMLTable("http://facedata.ornl.gov/ornl/npp_98-08.html”,
header = c("year","ring","CO2", "stem","root","leaf","fine root", "NPP"),
skip.rows=c(1,2),colClasses=c(rep("factor",3),rep("numeric",5)))
Many thanks for your help
The first row of the table is causing trouble. It maybe easiest to remove it:
library(XML)
appURL <- "http://facedata.ornl.gov/ornl/npp_98-08.html"
doc <- htmlParse(appURL)
removeNodes(doc["//table/tr[1]"]) # remove the first row with the troublesome header
myTable <- readHTMLTable(doc, which = 1)
> head(myTable)
Year Plot CO2 Stem Coarse Root Leaf Fine Root Total NPP
1 1998 1 elev 1540 127 362 168 2197
2 1998 2 elev 1487 139 418 175 2219
3 1998 3 amb 1085 112 333 231 1762
4 1998 4 amb 1204 113 368 185 1870
5 1998 5 amb 1136 109 382 56 1683
6 1999 1 elev 1218 98 475 295 2086

Conditional Sorting according to one value and then alphabetically

I have this group in a table. Where I need to display one value on the top and rest according to its alphabetical order.
Table
Column1 Value#1 Value#2
Alpha 12 26
Beta 65 745
Gamma 987 87
Pie 7 2
Non-Beta 132 426
Zeta 112 266
I want to sort it like this(Can anyone also tell me the real use of this other than Viewing Purpose)
Table
Column1 Value#1 Value#2
Non-Beta 132 426
Alpha 12 26
Beta 65 745
Gamma 987 87
Pie 7 2
Zeta 112 266
So the Non-Beta has to be displayed on the top and rest according to alphabetical order.
Edit
Thank you very much for the below reply Chris, Really appreciate and yes it works.
I have one more question from the over table format itself...How can I display it in the below format...
Table
Column1 Value#1 Value#2
Non-Beta 132 426
Alpha 12 26
Pie 7 2
Zeta 112 266
Total 263 720
Beta 65 745
Gamma 987 87
Total 1057 832
Thank you
Select the table, right-click the table handle, select Tablix Properties and select the Sorting tab. Press the Add button then click the fx button to open the expression editor. Enter the following expression:
=IIF(Fields!Column1.Value = "Non-Beta", "A" + Fields!Column1.Value, "B" + Fields!Column1.Value)
All we are doing is prefixing the special value field with something so it comes before the other fields.