How to Merge Column of Clusters and Columns with X-Y coord? - mysql

I have a column with the name of the points, a column with the X coordinates and a column with Y coordinates.
This is the tab on which I'm working:
I want to create a tab in which I have three columns, one with Clusters' ID, one with X coordinates and another with Y coordinates. And for each ckustr I want the X-Y coordinates.
I've tried the following code:
Xcoord <- sort(unique(tabprof$X_coord))
clusters <- sort(unique(tabprof$Cluster_ID))
I've tried do this in order to merge the two vectors, but it wasn't possible, because they have a different number of rows. It's probably due to the presence of clusters with the same X coord value.

Due to our talk in comments, I'll provide new solution. I'll use fake data.
A <- c(1,1,1,1,2,2,2,2)
B <- c(3,3,4,4,3,3,3,5)
df <- data.frame(A,B)
res <- unique(df)
> df
A B
1 1 3
2 1 3
3 1 4
4 1 4
5 2 3
6 2 3
7 2 3
8 2 5
> res
A B
1 1 3
3 1 4
5 2 3
8 2 5
So as you se if our A column is ClusterID and B X-coords, we have ClusterID duplicated but! we have unique coord to each one. What is more if two different IDs have the same coords it's no problem.
I hope that it'll helpful.

Related

How to calculate Moran's I and GWR when given duplicated factor data

I'm trying to do the statistical analysis by using the Moran's I, but it leads me into a serious problem.
Suppose the data look like:
y
indep1
indep2
coord_x
coord_y
District
y1
indep1 1
indep2 1
coord_x 1
coord_y 1
A
y2
indep1 2
indep2 2
coord_x 1
coord_y 1
A
y3
indep1 3
indep2 3
coord_x 1
coord_y 1
A
y4
indep1 4
indep2 4
coord_x 2
coord_y 2
B
y5
indep1 5
indep2 5
coord_x 2
coord_y 2
B
Note that the data is given as form of *.shp.
This dataframe has 2 unique districts, but total 5 rows of data. If we want to calculate the Moran's I, then the weighted matrix W is 2 by 2.
But If I run the code
ols <- lm(y~indep1+indep2, data=dataset)
We cannot calculate the
lm.morantest(ols, w) #w:weighted matrix
# returns "different length"
How can we solve that problem? IF the total number of data and the number of unique districts are different, then how can we check the spatial auto-correlation between the districts and how can we apply GWR(geographically weighted regression?) Any reference papers or your advises could be helpful.
Thank you for your help in advance.
I tried to calculate the moran's I by making the number our dependent variables same as the number of unique districts. It can be possible when aggregating the sum of dependent variables with respect to the districts.
library(reshape2)
y_agg <- dcast(shp#data, district~., value.var="dependent variable", sum)
y_agg <- y_agg$.
moran.test(y_agg, W)
But I think it is not the right way to analyze the spatial regression since all the other independent variables are ignored. How can I solve that problem? Is there any way for solving that problem without making the aggregated independent variables of my data?
Thank you.

How to perform a many-to-many or (at least) a outer-join in SPSS

usually I use [R] for my data analysis, but these days I have to use SPSS. I was expecting that data manipulation might get a little bit more difficult this way, but after my first day I kind of surrender :D and I really would appreciate some help ...
My problem is the following:
I have two data sets, which have an ID number. Neither data sets have a unique ID (in one data set, which should have unique IDs, there is kind of a duplicated row)
In a perfect world I would like to keep this duplicated row and simply perform a many-to-many-join. But I accepted, that I might have to delete this "bad" row (in dataset A) and perform a 1:many-join (join dataset B to dataset A, which contains the unique IDs).
If I run the join (and accept that it seems not to be possible to run a 1:many, but only a many:1-join), I have the problem, that I lose IDs. If I join dataset A to dataset B I lose all cases, that are not part of dataset B. But I really would like to have both IDs like in a full join or something.
Do you know if there is (kind of) a simple solution to my problem?
Example:
dataset A:
ID
VAL1
1
A
1
B
2
D
3
K
4
A
dataset B:
ID
VAL2
1
g
2
k
4
a
5
c
5
d
5
a
2
x
expected result (best solution):
ID
VAL1
VAL2
1
A
g
1
B
g
2
D
k
3
K
NA
4
A
a
2
D
x
expected result (second best solution):
ID
VAL1
VAL2
1
A
g
2
D
k
3
K
NA
4
A
a
5
NA
c
5
NA
d
5
NA
a
2
D
x
what I get (worst solution):
ID
VAL1
VAL2
1
A
g
2
D
k
4
A
a
5
NA
c
5
NA
d
5
NA
a
2
D
x
From your example It looks like what you need is a full many to many join, based on the ID's existing in dataset A. You can get this by creating a full Cartesian-Product of the two dataset, using dataset A as the first\left dataset.
The following syntax assumes you have the STATS CARTPROD extention command installed. If you don't you can see here about installing it.
First I'll recreate your example to demonstrate on:
dataset close all.
data list list/id1 vl1 (2F3) .
begin data
1 232
1 433
2 456
3 246
4 468
end data.
dataset name aaa.
data list list/id2 vl2 (2F3) .
begin data
1 111
2 222
4 333
5 444
5 555
5 666
2 777
3 888
end data.
dataset name bbb.
Now the actual work is fairly simple:
DATASET ACTIVATE aaa.
STATS CARTPROD VAR1=id1 vl1 INPUT2=bbb VAR2=id2 vl2
/SAVE OUTFILE="C:\somepath\yourcartesianproduct.sav".
* The new dataset now contains all possible combinations of rows in the two datasets.
* we will select only the relevant combinations, where the two ID's match.
select if id1=id2.
exe.

KDB: apply dyadic function across two lists

Consider a function F[x;y] that generates a table. I also have two lists; xList:[x1;x2;x3] and yList:[y1;y2;y3]. What is the best way to do a simple comma join of F[x1;y1],F[x1;y2],F[x1;y3],F[x2;y1],..., thereby producing one large table?
You have asked for the cross product of your argument lists, so the correct answer is
raze F ./: xList cross yList
Depending on what you are doing, you might want to look into having your function operate on the entire list of x and the entire list of y and return a table, rather than on each pair and then return a list of tables which has to get razed. The performance impact can be considerable, for example see below
q)g:{x?y} //your core operation
q)//this takes each pair of x,y, performs an operation and returns a table for each
q)//which must then be flattened with raze
q)fm:{flip `x`y`res!(x;y; enlist g[x;y])}
q)//this takes all x, y at once and returns one table
q)f:{flip `x`y`res!(x;y;g'[x;y])}
q)//let's set a seed to compare answers
q)\S 1
q)\ts do[10000;rm:raze fm'[x;y]]
76 2400j
q)\S 1
q)\ts do[10000;r:f[x;y]]
22 2176j
q)rm~r
1b
Setup our example
q)f:{([] total:enlist x+y; x:enlist x; y:enlist y)}
q)x:1 2 3
q)y:4 5 6
Demonstrate F[x1;y1]
q)f[1;4]
total x y
---------
5 1 4
q)f[2;5]
total x y
---------
7 2 5
Use the multi-valent apply operator together with each' to apply to each pair of arguments.
q)raze .'[f;flip (x;y)]
total x y
---------
5 1 4
7 2 5
9 3 6
Another way to achieve it using each-both :
x: 1 2 3
y: 4 5 6
f:{x+y}
f2:{ a:flip x cross y ; f'[a 0;a 1] }
f2[x;y]
5j, 6j, 7j, 6j, 7j, 8j, 7j, 8j, 9j

How to apply a formula for removing data noise in R?

I am working on NGSim Traffic data, having 18 columns and 1180598 rows in a text file. I want to smooth the position data, in the column 'Local Y'. I know there are built-in functions for data smoothing in R but none of them seem to match with the formula I am required to apply. The data in text file looks something like this:
Index VehicleID Total_Frames Local Y
1 2 5 35.381
2 2 5 39.381
3 2 5 43.381
4 2 5 47.38
5 2 5 51.381
6 4 8 504.828
7 4 8 508.325
8 4 8 512.841
9 4 8 516.338
10 4 8 520.854
11 4 8 524.592
12 4 8 528.682
13 4 8 532.901
14 5 7 39.154
15 5 7 43.153
16 5 7 47.154
17 5 7 51.154
18 5 7 55.153
19 5 7 59.154
20 5 7 63.154
The above data columns are just example taken out of original file. Here you can see 3 vehicles, with vehicle IDs = 2, 4 and 5 but in fact there are 2169 vehicles with different IDS. The column Total_Frames tell us how many times vehicle Id of each vehicle is repeated in the first column, for example in the table above, vehicle ID 2 is repeated 5 times, hence '5' in Total_Frames column. Following is the formula I am required to apply to remove data noise (smoothing) from column 'Local Y':
Smoothed Position Value = (1/(Summation of [EXP^-abs(i-k)/delta] from k=i-D to i+D)) * ( (Summation of (Local Y) *[EXP^-abs(i-k)/delta] from k=i-D to i+D))
where,
i = index #
delta = 5
D = 15
I have tried using the built-in functions, which I know of, but they don't smooth the data as required. My question is: Is there any built-in function in R which can do the data smoothing in the way of given formula or which could take this formula as an argument? I need to apply the formula to every value in Local Y which has 15 values before and 15 values after them (i-D and i+D) for same vehicle Id. Can anyone give me any idea how to approach the problem? Thanks in advance.
You can place your formula in a function and then use the apply function of R to apply it to the elements in your "Local Y" column of the dataframe

Is this pattern reconstitution or what is the name for this problem?

I've following problem and don't know the terminology to describe it and hence search for possible solutions.
I have a pivot table (matrix), eg each row and column have a named header. there is a defined set for rows and columns. Now let's assume that 10 rows are "combined" meaning each column is summed up to create a new "pattern".
What I would like is a way to determine alternative row combinations that lead to the same or similar "combined" pattern.
1 1 1
5 5 5
"Combined"
6 6 6
alternate row combination:
2 2 2
4 4 4
Suggestions? How is this problem called?
http://en.wikipedia.org/wiki/System_of_linear_equations#Matrix_equation
I just have to transpose above matrix to get Matrix A
[code]
1 5
1 5
1 5
[/code]
combined matrix is a vector b:
[code]
6
6
6
[/code]
and x would be just a vector full of 1.