Importing Nodes with Coordinates to Gephi from CSV - csv

This question seems pretty stupid but I actually fail to find a simple solution to this. I have a csv file that is structured like this:
0 21 34.00 34.00
1 23 35.00 25.00
2 25 45.00 65.00
The first column is the node's id, the second is an unimportant attribute. The 3rd and 4th attribute are supposed to be the x and y position of the nodes.
I can import the file into the Data Laboratory without problems, but I fail to explain to Gephi to use the x y attributes as the corresponding properties. All I want to achieve is that Gephi sets the x Property to the value of the x Attribute (and y respectively). Also see picture.
Thanks for your help!

In the Layout window, you can select "Geo Layout" and define which columns are used as Latitude and Longitude.
The projection might come in weird if you do not actually have GeoData, but for me, this is fine.

In Gephi 0.8 there was a plugin called Recast column. This plugin is unfortunately not ported to Gephi 0.9 yet, but it allowed you to set Standard (hidden) Columns in the Node Table, from visible values in the nodes table. Thus if you have two columns of type Float or Decimal that represent your coordinates, you could set the coordinate values of your nodes.

Related

TomTom OpenLR Binary Version 3, Differences and mistakes of the OpenLR Whitepaper and the TomTom Demo Tool

I am currently trying to decode OpenLR binary Strings (version 3) as specified in the OpenLR Whitepaper (version 1.5, rev 2) which can be found at the OpenLR Association website.
Unfortunately, while comparing the results with the TomTom Demo Tool.
I had to find some key differences, mistakes, and missing information.
For example, decoding this binary String with the Demo Tool (enable show decoding steps)
Cwl3syVRnAELHgor/6YBBw0DgP61AQwnDGz94AEIGQe1/j4BCj0TSv9NAXYZJw==
shows that:
Negative byte values for relative coordinates have to be flipped and 1 added to obtain the actual value. The OpenLR Whitepaper is missing this step.
is this step also necessary for absolute coordinates or only for relative coordinates?
The calculation of relative coordinates is described to be the same calculation for absolute coordinates
(int - (sgn(int) *0.5)) * 360 / 2^resolution
(resolution = 24 for absolute, 16 for relative), but it gets obvious that this equation does not lead to the correct value.
The calculation would lead to a value of -0.49 not -0.0009 using the values and formula as shown. Instead, for relative coords, the (eventually flipped and 1 added) concatenated byte value has to be divided by 10^5 to obtain -0.0009.
For Location Reference Points n>0,n<2 (the second LRP, thus first relative coordinate) the final addition of the values is somehow correct, but for n >=2 the differences get bigger, and thus the result is wrong, it can be easily seen in the demo tools calculation:
the addition of the correct decoded byte values is simply wrong:
This leads to big differences in the final location. The resulting value of the demo tool is correct, as the locations described follow the streets while using the correct sum would shift them off the street. But the equation is missing some key aspects.
Also, the OpenLR Whitepaper describes adding the relative coordinate value to the previous LRP.
(comparing the values used in the demo tool shows that the first LRP is beeing used instead of the previous LRP)
Which formula is the correct one? The Demo Tool generated correct values but uses wrong calculations.
Edit, for the third LRP I found using the previous LRP leads to the value calculated by the online tool (which shows the first LRP value to be used).
For reference and comparison some examples:
Binary String of above example:
Cwl3syVRnAELHgor/6YBBw0DgP61AQwnDGz94AEIGQe1/j4BCj0TSv9NAXYZJw==
Differences:
Using the correct sum of relative coord value and LRP 0 (first 2 LRPs are correct, then it gets worse, can also be checked by verifying the sum of the demo tool for LRPs 3-6):
The demo tool uses a wrong calculation but the final values shown are correct as they follow the street. It seems to be mirrored along a horizontal line going through the second LRP (first relative coordinate):
I'd be very thankful for any hints on how to solve this correctly.
Steps done:
I wrote a decoder according to the whitepaper and contacted TomTom Support who asked me to discuss this issue here. I double-checked the calculations and found mistakes in the demo tool as well as in the OpenLR white paper specification.
I solved it.
The calculation for relative Coords:
If the first bit of the concatinated bytes (of lat or lon) is 1 (negative value):
byteValueRel = bytevalue - 2^16
In any case divide it by 10^5:
byteValueRel = bytevalue/(10^5)
The resulting relative coordinate is the sum of the previous LRP value and the calculated relative value:
previousLrpValue + byteValueRel

Joining a .csv and a vector layer in QGIS

I have a couple of layers that I need to join in QGIS. One of them is a vector one and contains the information about the geometry (a series of polygons, each one characterized by a certain id). On the other side I have a .csv file in which there is information about these polygons, but it is not a single data per polygon, here my problem with the joins. It is a temporal dataset file in which a field appears with a value assigned for each date and polygon (not continuously, but almost).
An example of the .csv file would be:
id
polygon
date
cost
1
A1
01-01
100
2
A2
01-01
500
...
...
...
...
100
A1
02-01
250
101
A2
02-01
360
102
A3
02-01
150
The idea of joining both files is to be able to make each polygon to be painted (with the help of the "temporal" tool) depending on whether it exceeds a certain value of the cost field.
I have tried to make a relation from "project" but I could only access the form.
Thank you very much!
Follow solutions works for me.
Add Delimited Text with right data types (our csv).
Vector general -> join attribute by field value. Choose the Join type "Create separate features for each matching feature.

Find the Relationship Between Two Logarithmic Equations

No idea if I am asking this question in the right place, but here goes...
I have a set of equations that were calculated based on numbers ranging from 4 to 8. So an equation for when this number is 5, one for when it is 6, one for when it is 7, etc. These equations were determined from graphing a best fit line to data points in a Google Sheet graph. Here is an example of a graph...
Example...
When the number is between 6 and 6.9, this equation is used: windGust6to7 = -29.2 + (17.7 * log(windSpeed))
When the number is between 7 and 7.9, this equation is used: windGust7to8 = -70.0 + (30.8 * log(windSpeed))
I am using these equations to create an image in python, but the image is too choppy since each equation covers a range from x to x.9. In order to smooth this image out and make it more accurate, I really would need an equation for every 0.1 change in number. So an equation for 6, a different equation for 6.1, one for 6.2, etc.
Here is an example output image that is created using the current equations:
So my question is: Is there a way to find the relationship between the two example equations I gave above in order to use that to create a smoother looking image?
This is not about logarithms; for the purposes of this derivation, log(windspeed) is a constant term. Rather, you're trying to find a fit for your mapping:
6 (-29.2, 17.7)
7 (-70.0, 30.8)
...
... and all of the other numbers you have already. You need to determine two basic search paramteres:
(1) Where in each range is your function an exact fit? For instance, for the first one, is it exactly correct at 6.0, 6.5, 7.0, or elsewhere? Change the left-hand column to reflect that point.
(2) What sort of fit do you want? You are basically fitting a pair of parameterized equations, one for each coefficient:
x y x y
6 -29.2 6 17.7
7 -70.0 7 30.8
For each of these, you want to find the coefficients of a good matching function. This is a large field of statistical and algebraic study. Since you have four ranges, you will have four points for each function. It is straightforward to fit a cubic equation to each set of points in Cartesian space. However, the resulting function may not be as smooth as you like; in such a case, you may well find that a 4th- or 5th- degree function fits better, or perhaps something exponential, depending on the actual distribution of your points.
You need to work with your own problem objectives and do a little more research into function fitting. Once you determine the desired characteristics, look into scikit for fitting functions to do the heavy computational work for you.

Non Scaled SSRS Line Chart with mulitple series

I am trying to present time series of multiple sensors on a single SSRS (v14) line chart
I need to plot N series, with each independently plotting the series data in the space provided by the chart (independent vertical axis)
More about the data
There can be anywhere from ~1-10 series
The challenge is that they are different orders of magnitude.
One might be degrees F (~0-212)
One might be Carbon ppm (~1-16)
One might be Ftlbs Thrust (~10k-100k)
the point is , they have no relation and can be very different
The exact value is not important. I can hide the vertical axis
More about what I am trying to do
The idea is to show the multiple time series, plotted together against time for the 4 hours before and after
'an event'. Its not the necessarily the exact value that is important. the subject matter expert would be looking for something odd (temperature falls, thrust spikes, etc).
Things I have tried
If there were just 2 series, i could easily use the 2nd axis available in the SSRS chart. Thats exactly the idea I am chasing. But in this case, I want N series to plot using its own axis.
I have tried stacking N transparent graphs on top of each other. This would be a really ugly solution, but SSRS even wont let you do it. It unstacks them for you.
I have experimented with the Allow Scale Breaks property on the Vert Axis. This would solve the problem but we don't like the 'double jagged line'
Turning on Logarithmic scale is a possibility. It does do a better job of displaying all the data. but its not really what we want. Its going to change the shape of data that ranges over a couple orders of magnitude.
I tried the sparkline component and am having the same problem.
This approach is essentially the same a Greg's answer above. I've had to do this same process in the past comparing trends of data even though the units were dissimilar.
I took a very simple approach of adding an additional column to the query that showed each value as a percentage of the maximum value in each series.
As an example (just 2 series here for clarity) I started with data like this in myTable
Series Month myValue
A Jan 4
A Feb 8
A Mar 16
B Jan 200
B Feb 300
B Mar 400
My Dataset query would be something like.
SELECT *, myValue / MAX(myValue) OVER(PARTITION BY Series) as myPlotValue FROM myTable
This gives us a final dataset which looks liek this.
Series Month myValue myPlotValue
A Jan 4 0.25
A Feb 8 0.5
A Mar 16 1
B Jan 200 0.5
B Feb 300 0.75
B Mar 400 1
As you can see all plot values are now between 0 and 1.
I created that charts using the myPlotValue field and had the option of using the original values from the myValue field as datapoint labels.
After talking to some math people, this is a standard problem and it is solved by a process called normalization of the data.
Essentially you are changing all the series to fit in a given range (usually 0-1)
You can scale and add an offset if that makes sense for your problem domain somehow.
https://www.statisticshowto.datasciencecentral.com/normalized/

Weka Decision Tree

I am trying to use weka to analyze some data. I've got a dataset with 3 variables and 1000+ instances.
The dataset references movie remakes and
how similar they are (0.0-1.0)
the difference in years between the movie and the remake
and lastly if they were made by the same studio (yes or no)
I am trying to make a decision tree to analyze the data. Using the J48 (because that's all I have ever used) I only get one leaf. Im assuming I'm doing something wrong. Any help is appreciated.
Here is a snippet from the data set:
Similarity YearDifference STUDIO TYPE
0.5 36 No
0.5 9 No
0.85 18 No
0.4 10 No
0.5 15 No
0.7 6 No
0.8 11 No
0.8 0 Yes
...
If interested the data can be downloaded as a csv here http://s000.tinyupload.com/?file_id=77863432352576044943
Your data set is not balanced cause there are almost 5 times more "No" then "Yes" for class attribute. That's why J48 is tree which is actually just one leaf that classifies everything as "NO". You can do one of these things:
sample your data set so you have equal number of No and Yes
Try using better classification algorithm e.g. Random Forest (it's located few spaces below J48 in Weka explorer GUI)