Data frame error when converting iGraph to gexf object - igraph

I am trying to convert an iGraph object to a gexf object using the rgexf package so that I can write a file usable with Gephi, which I prefer for network visualization.
My iGraph object is created by reading in two CSVs: h.edges and h.nodes. There are both edge and node attributes. Once the files are read in, I create the iGraph object, calculate centrality measures and then attach the centrality measures as node attributes. The code looks like so:
iNet = graph_from_data_frame(d=h.edges, vertices = h.nodes, directed = F)
V(iNet)$degree = degree(iNet)
V(iNet)$eig = evcent(iNet)$vector
V(iNet)$betweenness = betweenness(iNet)
This appears to be working fine since I can do all the normal iGraph functions -- plot, calculate centralities, identify communities, etc. My problem comes when I try to convert this to a gexf object. I run the following code:
library(rgexf)
iNet.gexf igraph.to.gexf(iNet)
But get the below error message:
Error in `[.data.frame`(x, r, vars, drop = drop) :
undefined columns selected
Anyone know what's happening? Although I know the example here can all be done just by uploading the two CSVs straight to Gephi and running the calculations there, the end goal is to be able to attach iGraph's more robust calculations as attributes in ways that Gephi can't.

Related

how to find centrality of nodes within clusters using i- graph and python

I'm working on network analysis and I'm new to python. I want to find out the centrality of every node within a cluster using i graph and python pandas.
I have tried the following:
Creating a graph:
tuples = [tuple(x) for x in data.values]
g=igraph.Graph.TupleList(tuples, directed = False,weights=True)
community detection using fast greedy algorithm:
fm = g.community_fastgreedy()
fm1 = fm.as_clustering()
clusters like this are formed:
[1549] 96650006, 966543799, 966500080
[1401] 96650006, 966567865, 966500069, 966500071
Now, I would like to get the eigenvalue centrality for each number within a cluster, so that i know which is the most important number within a cluster.
I am not very familiar with the eigenvector centrality in igraph, but here is the following solution I came up with:
# initial code is the same as yours
import numpy as np
# iterate over all created subgraphs created:
for subgraph in fm1.subgraphs():
# this is basically already what you want
cents = subgraph.eigenvector_centrality()
# get additionally the index of the respective vector
max_idx = np.argmax(cents)
print(subgraph.vs[max_idx]) # gets the correct vertex element.
Essentially, you want to utilize the option to access the created clusters as a graph (.subgraphs() allows you to do exactly that). The rest is then "just" simple manipulation of the graph object to get the element with the respective maximum eigenvector centrality.

Splitting a feature collection by system index in Google Earth Engine?

I am trying to export a large feature collection from GEE. I realize that the Python API allows for this more easily than the Java does, but given a time constraint on my research, I'd like to see if I can extract the feature collection in pieces and then append the separate CSV files once exported.
I tried to use a filtering function to perform the task, one that I've seen used before with image collections. Here is a mini example of what I am trying to do
Given a feature collection of 10 spatial points called "points" I tried to create a new feature collection that includes only the first five points:
var points_chunk1 = points.filter(ee.Filter.rangeContains('system:index', 0, 5));
When I execute this function, I receive the following error: "An internal server error has occurred"
I am not sure why this code is not executing as expected. If you know more than I do about this issue, please advise on alternative approaches to splitting my sample, or on where the error in my code lurks.
Many thanks!
system:index is actually ID given by GEE for the feature and it's not supposed to be used like index in an array. I think JS should be enough to export a large featurecollection but there is a way to do what you want to do without relying on system:index as that might not be consistent.
First, it would be a good idea to know the number of features you are dealing with. This is because generally when you use size().getInfo() for large feature collections, the UI can freeze and sometimes the tab becomes unresponsive. Here I have defined chunks and collectionSize. It should be defined in client side as we want to do Export within the loop which is not possible in server size loops. Within the loop, you can simply creating a subset of feature starting from different points by converting the features to list and changing the subset back to feature collection.
var chunk = 1000;
var collectionSize = 10000
for (var i = 0; i<collectionSize;i=i+chunk){
var subset = ee.FeatureCollection(fc.toList(chunk, i));
Export.table.toAsset(subset, "description", "/asset/id")
}

R: jsonlite's stream_out function producing incomplete/truncated JSON file

I'm trying to load a really big JSON file into R. Since the file is too big to fit into memory on my machine, I found that using the jsonlite package's stream_in/stream_out functions is really helpful. With these functions, I can subset the data first in chunks without loading it, write the subset data to a new, smaller JSON file, and then load that file as a data.frame. However, this intermediary JSON file is getting truncated (if that's the right term) while being written with stream_out. I will now attempt to explain with further detail.
What I'm attempting:
I have written my code like this (following an example from documentation):
con_out <- file(tmp <- tempfile(), open = "wb")
stream_in(file("C:/User/myFile.json"), handler = function(df){
df <- df[which(df$Var > 0), ]
stream_out(df, con_out, pagesize = 1000)
}, pagesize = 5000)
myData <- stream_in(file(tmp))
As you can see, I open a connection to a temporary file, read my original JSON file with stream_in and have the handler function subset each chunk of data and write it to the connection.
The problem
This procedure runs without any problems, until I try to read it in myData <- stream_in(file(tmp)), upon which I receive an error. Manually opening the new, temporary JSON file reveals that the bottom-most line is always incomplete. Something like the following:
{"Var1":"some data","Var2":3,"Var3":"some othe
I then have to manually remove that last line after which the file loads without issue.
Solutions I've tried
I've tried reading the documentation thoroughly and looking at the stream_out function, and I can't figure out what may be causing this issue. The only slight clue I have is that the stream_out function automatically closes the connection upon completion, so maybe it's closing the connection while some other component is still writing?
I inserted a print function to print the tail() end of the data.frame at every chunk inside the handler function to rule out problems with the intermediary data.frame. The data.frame is produced flawlessly at every interval, and I can see that the final two or three rows of the data.frame are getting truncated while being written to file (i.e., they're not being written). Notice that it's the very end of the entire data.frame (after stream_out has rbinded everything) that is getting chopped.
I've tried playing around with the pagesize arguments, including trying very large numbers, no number, and Inf. Nothing has worked.
I can't use jsonlite's other functions like fromJSON because the original JSON file is too large to read without streaming and it is actually in minified(?)/ndjson format.
System info
I'm running R 3.3.3 x64 on Windows 7 x64. 6 GB of RAM, AMD Athlon II 4-Core 2.6 Ghz.
Treatment
I can still deal with this issue by manually opening the JSON files and correcting them, but it's leading to some data loss and it's not allowing my script to be automated, which is an inconvenience as I have to run it repeatedly throughout my project.
I really appreciate any help with this; thank you.
I believe this does what you want, it is not necessary to do the extra stream_out/stream_in.
myData <- new.env()
stream_in(file("MOCK_DATA.json"), handler = function(df){
idx <- as.character(length(myData) + 1)
myData[[idx]] <- df[which(df$id %% 2 == 0), ] ## change back to your filter
}, pagesize = 200) ## change back to 1000
myData <- myData %>% as.list() %>% bind_rows()
(I created some mock data in Mockaroo: generated 1000 lines, hence the small pagesize, to check if everything worked with more than one chunk. The filter I used was even IDs because I was lazy to create a Var column.)

Tf-slim: ValueError: Variable vgg_19/conv1/conv1_1/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?

I am using tf-slim to extract features from several batches of images. The problem is my code works for the first batch , after that I get the error in the title.My code is something like this:
for i in range(0, num_batches):
#Obtain the starting and ending images number for each batch
batch_start = i*training_batch_size
batch_end = min((i+1)*training_batch_size, read_images_number)
#obtain the images from the batch
images = preprocessed_images[batch_start: batch_end]
with slim.arg_scope(vgg.vgg_arg_scope()) as sc:
_, end_points = vgg.vgg_19(tf.to_float(images), num_classes=1000, is_training=False)
init_fn = slim.assign_from_checkpoint_fn(os.path.join(checkpoints_dir, 'vgg_19.ckpt'),slim.get_model_variables('vgg_19'))
feature_conv_2_2 = end_points['vgg_19/pool5']
So as you can see, in each batch, I select a batch of images and use the vgg-19 model to extract features from the pool5 layer. But after the first iteration I get error in the line where I am trying to obtain the end-points. One solution, as I found on the internet is to reset the graph each time , but I don't want to do that because I have some weights in my graph in later part of the code which I train using these extracted features. I don't want to reset them. Any leads highly appreciated. Thanks!
You should create your graph once, not in a loop. The error message tells you exactly that - you try to build the same graph twice.
So it should be (in pseudocode)
create_graph()
load_checkpoint()
for each batch:
process_data()

Saving and restoring geometries in OpenLayers

Context: I'm a just-hours-old newbie at OpenLayers, please be gentle.
Fundamentally, I have a map with some drawn objects on it. If I understand things correctly, I have a number of OpenLayer.Feature.Vector (layers?) with a number of OpenLayer.Geometry "things" (like LinearRing) on it.
At the moment, I seem to be able to get a nice representation of the geometry, using .toString(). Yes, I suspect I'm doing it wrong -- feel free to point me in the right direction.
This yields a very human readable, and database storable, strings such as:
POINT(-104.74560546875 44.2841796875)
POLYGON((-96.52783203125 44.6796875,-96.52783203125 45.734375,-92.22119140625 45.734375,-92.22119140625 44.6796875,-96.52783203125 44.6796875))
LINESTRING(-105.71240234375 44.6796875,-106.06396484375 42.658203125,-103.55908203125 42.7021484375,-103.47119140625 45.55859375,-104.65771484375 45.20703125)
Is there an inverse way of getting these back into the object format from whence they came?
I'd love to be using JSON, but can't seem to get GeoJSON to accept my OpenLayer.Feature.Vector object (which is what the CLASS_NAME property says it is when I peer inside).
Many thanks.
The Openlayers.Geometry objects’ toString method converts them nicely to WKT (Well-Known Text). If you use a GIS layer on top of your database (like PostGIS for PostGres, SQL Spatial for SQL Server, Spatialite for SQLite, etc.), they should offer functions that enable you to process WKT.
But if you want to convert that WKT to a new Openlayers.Geometry object (in the browser), you can use the fromWKT function:
var point = OpenLayers.Geometry.fromWKT('POINT(-104.74560546875 44.2841796875)');
alert(point.toString()); // POINT(-104.74560546875 44.2841796875)
Here, the variable point will now contain a new Openlayers.Geometry object, which has the same properties as the original one you used toString() on.
If you pass an array to the fromWKT function, it will return a GeometryCollection containing all the generated geometries.
var geometryTexts = [
'POINT(-104.74560546875 44.2841796875)'
, 'POLYGON((-96.52783203125 44.6796875,-96.52783203125 45.734375,-92.22119140625 45.734375,-92.22119140625 44.6796875,-96.52783203125 44.6796875))'
, 'LINESTRING(-105.71240234375 44.6796875,-106.06396484375 42.658203125,-103.55908203125 42.7021484375,-103.47119140625 45.55859375,-104.65771484375 45.20703125)'
],
collection = OpenLayers.Geometry.fromWKT(geometryTexts);
After this, collection.toString() should yield the following:
GEOMETRYCOLLECTION(POINT(-104.74560546875 44.2841796875),POLYGON((-96.52783203125 44.6796875,-96.52783203125 45.734375,-92.22119140625 45.734375,-92.22119140625 44.6796875,-96.52783203125 44.6796875)),LINESTRING(-105.71240234375 44.6796875,-106.06396484375 42.658203125,-103.55908203125 42.7021484375,-103.47119140625 45.55859375,-104.65771484375 45.20703125))
In my other answer, I went with WKT because you mentioned it. I now see that you seem to prefer GeoJSON.
To convert a vector layer or an Openlayers.Geometry object to a GeoJSON string, you should use the OpenLayers.Format.GeoJSON.write function:
var geoJSON = new OpenLayers.Format.GeoJSON(),
geoJSONText = geoJSON.write(geometryObject);
Note that you should be able to pass your object to this function, since (according to documentation) it accepts an OpenLayers.Feature.Vector as well as a OpenLayers.Geometry or an array of features.
Conversely, when you’ve got a GeoJSON string, you can convert that back to an object using the OpenLayers.Format.GeoJSON.read function:
var geometry = geoJSON.read(geoJSONText, 'Geometry');
The second parameter lets you indicate which type of object you’d like returned. Read the docs linked to for more information.
Also, take a look at this demo for a more extensive example. (View the source of the page to see how they’re doing it).