Hello i would like to know how can i merge 2 data frames in R,there is a merge function ,but i would like to do this :
data frame1
X Y Z
1 1 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 1 1 1
data frame 2
A B C
1 2 2 2
2 2 2 2
3 2 2 2
mergedataframe
X Y Z A B C
1 1 1 1
2 1 1 1
3 1 1 1 2 2 2
4 1 1 1 2 2 2
5 1 1 1 2 2 2
the think is i must synchronize 3 csv files (dataframe) and i have no idea how to it with R.
if somebody have any idea about it ,thank u
i redit my post i would like my merged data frame like that :
data frame1
X Y Z
1 1 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 1 1 1
6 1 1 1
data frame 2
A B C
1 2 2 2
2 2 2 2
mergedataframe
X Y Z A B C
1 1 1 1
2 1 1 1
3 1 1 1 2 2 2
4 1 1 1 2 2 2
5 1 1 1
6 1 1 1
df1 <- data.frame(X=rep(1,5),Y=1, Z=1)
df2 <- data.frame(A=rep(2,3),B=2, C=2)
#rownames(df2) <- 3:5
rownames(df2) <- tail(rownames(df1), nrow(df2))
mergedataframe <- merge(df1,df2, by=0, all=TRUE)
mergedataframe <- mergedataframe[,-1]
mergedataframe
X Y Z A B C
1 1 1 1 NA NA NA
2 1 1 1 NA NA NA
3 1 1 1 2 2 2
4 1 1 1 2 2 2
5 1 1 1 2 2 2
Related
I have a MySQL table full of outcomes from a chess tournament:
P1_id P2_id Outcome_for_P1 Day
1 2 W 2015-12-07
1 3 W 2015-12-06
1 4 D 2015-12-05
1 5 L 2015-12-04
1 6 D 2015-12-03
1 7 D 2015-12-02
1 8 L 2015-12-01
2 1 L 2015-12-07
2 3 W 2015-12-06
2 4 W 2015-12-05
2 5 W 2015-12-04
2 6 L 2015-12-03
2 7 D 2015-12-02
2 8 W 2015-12-01
This is great, but I've realized I need to derive 3 new columns. I'd like to track P1_id's record throughout the tournament, as such:
P1_id P2_id Outcome_for_P1 P1_W P1_L P1_D Day
1 2 W 2 2 3 2015-12-07
1 3 W 1 2 3 2015-12-06
1 4 D 0 2 3 2015-12-05
1 5 L 0 2 2 2015-12-04
1 6 D 0 1 2 2015-12-03
1 7 D 0 1 1 2015-12-02
1 8 L 0 1 0 2015-12-01
2 1 L 4 2 1 2015-12-07
2 3 W 4 1 1 2015-12-06
2 4 W 3 1 1 2015-12-05
2 5 W 2 1 1 2015-12-04
2 6 L 1 1 1 2015-12-03
2 7 D 1 0 1 2015-12-02
2 8 W 1 0 0 2015-12-01
Something like this is simple, but I don't know how to modify it so that it carries the count through
SELECT *,
CASE Outcome_for_P1 when 'W' then 1
ELSE P1_W
END AS P1_W
FROM
chess;
Thanks for your help in advance.
select
P1_id,
P2_id,
Outcome_for_P1,
P1_W,
P1_L,
P1_D,
Day
from (
select c.*,
#w:= if(#prev_p1 = P1_id, if(Outcome_for_P1 = 'W',#w+1,#w),if(Outcome_for_P1 = 'W',1,0)) as P1_W,
#l:= if(#prev_p1 = P1_id, if(Outcome_for_P1 = 'L',#l+1,#l),if(Outcome_for_P1 = 'L',1,0)) as P1_L,
#d:= if(#prev_p1 = P1_id, if(Outcome_for_P1 = 'D',#d+1,#d),if(Outcome_for_P1 = 'D',1,0)) as P1_D,
#prev_p1:= P1_id
from chess c,(select #w:=0,#l:=0,#d:=0,#prev_p1:=0)x
order by P1_id asc, Day asc
)x
order by P1_id asc, Day asc;
Please note: this question is an extension / modification of a question I asked here that was expertly and graciously answered by
#Abhik Chakraborty
I have a MySQL table full of outcomes from a chess tournament:
P1_id P2_id Outcome_for_P1 Day
1 2 W 2015-12-07
1 3 W 2015-12-06
1 4 D 2015-12-05
1 5 L 2015-12-04
1 6 D 2015-12-03
1 7 D 2015-12-02
1 8 L 2015-12-01
2 1 L 2015-12-07
2 3 W 2015-12-06
2 4 W 2015-12-05
2 5 W 2015-12-04
2 6 L 2015-12-03
2 7 D 2015-12-02
2 8 W 2015-12-01
I've realized I need to derive 3 new columns. In my previous question, I was trying to track P1_id's record throughout the tournament, by showing the outcome of the chess game that is represented by each row. This was as such:
P1_id P2_id Outcome_for_P1 P1_W P1_L P1_D Day
1 2 W 2 2 3 2015-12-07
1 3 W 1 2 3 2015-12-06
1 4 D 0 2 3 2015-12-05
1 5 L 0 2 2 2015-12-04
1 6 D 0 1 2 2015-12-03
1 7 D 0 1 1 2015-12-02
1 8 L 0 1 0 2015-12-01
2 1 L 4 2 1 2015-12-07
2 3 W 4 1 1 2015-12-06
2 4 W 3 1 1 2015-12-05
2 5 W 2 1 1 2015-12-04
2 6 L 1 1 1 2015-12-03
2 7 D 1 0 1 2015-12-02
2 8 W 1 0 0 2015-12-01
This can be done using the code on this SQL Fiddle. Problem solved.
BUT I am having trouble modifying this SQL to populate the columns in a slightly different way. In this new situation, I'd like to show the competitor's Outcome before the conclusion of the game. It would show what the player's record was going into the chess match. In other words, how can I come up with this:
P1_id P2_id Outcome_for_P1 P1_W P1_L P1_D Day
1 2 W 1 2 3 2015-12-07
1 3 W 0 2 3 2015-12-06
1 4 D 0 2 2 2015-12-05
1 5 L 0 1 2 2015-12-04
1 6 D 0 1 1 2015-12-03
1 7 D 0 1 0 2015-12-02
1 8 L 0 0 0 2015-12-01
2 1 L 4 1 1 2015-12-07
2 3 W 3 1 1 2015-12-06
2 4 W 2 1 1 2015-12-05
2 5 W 1 1 1 2015-12-04
2 6 L 1 0 1 2015-12-03
2 7 D 1 0 0 2015-12-02
2 8 W 0 0 0 2015-12-01
I'm struggling a bit because I don't think it can be done by the code on the Fiddle. I feel like a whole new approach might be necessary, but I'm not sure the best approach.
To resolve this issue you can implement an unique id that increments by 1 with each match, then only add up stats for matches that have an ID less than the current one
The sample from your fiddle-link doesn't work for me.
But here is an other solution: Join your matches with all previous matches from the same person and count the outcomes using SUM
SELECT c.P1_id, c.P2_id, c.Outcome_for_P1,
SUM(IF(c1.Outcome_for_P1='W',1,0)) P1_W,
SUM(IF(c1.Outcome_for_P1='L',1,0)) P1_L,
SUM(IF(c1.Outcome_for_P1='D',1,0)) P1_D,
c.Day
FROM chess c
LEFT JOIN chess c1
ON c1.P1_id = c.P1_id
AND c1.Day < c.Day
GROUP BY c.P1_id, c.P2_id, c.Outcome_for_P1, c.Day
ORDER BY c.P1_id ASC, c.Day DESC;
http://sqlfiddle.com/#!9/641efb/44
I'm using the clustercommand and am having difficulties due to insufficient memory. To get around this problem I would like to delete all duplicate observations.
I would like to cluster via the variables A, B and C and I identify duplicate values as so:
/* Create dummy data */
input id A B C
1 1 1 1
2 1 1 1
3 1 1 1
4 2 2 2
5 2 2 2
6 2 2 2
7 2 2 2
8 3 3 3
9 3 3 3
10 4 4 4
end
sort A B C id
duplicates tag A B C, gen(dup_tag)
I would like to add a variable dup_ID which tells me that ids 2 and 3 are duplicates of id 1, ids 5 and 6 of id 4, and so on. How could I do this?
/* Desired result */
id A B C dup_id
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 2 2 2 4
5 2 2 2 4
6 2 2 2 4
7 2 2 2 4
8 3 3 3 8
9 3 3 3 8
10 4 4 4 10
duplicates is a wonderful command (see its manual entry for why I say that), but you can do this directly:
bysort A B C : gen tag = _n == 1
tags the first occurrence of duplicates of A B C as 1 and all others as 0. For the other way round use _n > 1, _n != 1, or whatever.
EDIT:
So then the id of tagged observations is just
by A B C: gen dup_id = id[1]
For basic technique with by: see (e.g.) this discussion
You can refer to the first observation in each group of A B C using the subscript [1] on ID. Note the (id) argument in bysort, which sorts by id, but identifies the groups by A, B, and C only.
clear
input id A B C
1 1 1 1
2 1 1 1
3 1 1 1
4 2 2 2
5 2 2 2
6 2 2 2
7 2 2 2
8 3 3 3
9 3 3 3
10 4 4 4
end
bysort A B C (id): gen dup_id = id[1]
li, noobs sepby(dup_id)
yielding
+-------------------------+
| id A B C dup_id |
|-------------------------|
| 1 1 1 1 1 |
| 2 1 1 1 1 |
| 3 1 1 1 1 |
|-------------------------|
| 4 2 2 2 4 |
| 5 2 2 2 4 |
| 6 2 2 2 4 |
| 7 2 2 2 4 |
|-------------------------|
| 8 3 3 3 8 |
| 9 3 3 3 8 |
|-------------------------|
| 10 4 4 4 10 |
+-------------------------+
item qty
1201-10-005-A 1
1110-01-006-A 1
1112-01-006-A 1
1202-01-008-A 1
1202-01-023-A 1
G-1000-00-003-A 1
Q-2252-00-004-D 1
1150-01-002-A 1
1201-01-009-A 1
1201-01-010-A 1
1201-01-012-A 1
1201-01-013-A 1
1201-02-005-A 1
1201-02-006-A 1
1201-04-001-A 1
1201-05-001-A 1
1201-06-002-A 1
1201-06-003-A 1
1201-06-004-A 1
1201-07-001-A 1
1201-07-002-A 1
1201-07-005-A 1
1201-07-006-A 1
1201-07-009-A 1
1201-07-007-A 1
1201-06-004-A 2
1201-07-001-A 2
1201-07-002-A 2
1201-07-005-A 2
1201-07-006-A 2
1201-07-007-A 2
1201-07-009-A 2
1201-10-005-A 2
1202-01-008-A 2
1202-01-023-A 2
1110-01-006-A 2
1201-06-004-A 3
1201-07-001-a 3
1201-07-002-A 3
1201-07-005-A 3
1201-07-006-a 3
1201-07-007-A 3
1201-07-009-A 3
1201-10-005-A 3
1202-01-008-A 3
1202-01-023-A 3
1110-01-006-A 3
1130-03-009-A 3
1201-06-004-A 4
1201-07-001-A 4
1201-07-002-A 4
1201-07-005-A 4
1201-07-006-A 4
1201-07-007-A 4
1201-07-009-A 4
1201-10-005-A 4
1202-01-008-A 4
1202-01-023-A 4
1110-01-006-A 4
1130-03-009-A 4
1110-01-006-A 5
1130-03-009-A 5
1201-01-009-A 1
0004-08-107-A 1
0010-08-012-A 1
1000-00-003-B 1
Same item repeat show max quantuty value ony
You need to use Group By:
select item,max(quantity)
from table
group by item
First, I have a shortest path matrix generated with igraph (shortest path)
When I want to retreive the node names with "get.shortest.path" it just brings me the number of each column and not its name:
[,a] [,b] [,c] [,d] [,e] [,f] [,g] [,h] [,i] [,j]
[a,] 0 1 2 3 4 5 4 3 2 1
[b,] 1 0 1 2 3 4 5 4 3 2
[c,] 2 1 0 1 2 3 4 5 4 3
[d,] 3 2 1 0 1 2 3 4 5 4
[e,] 4 3 2 1 0 1 2 3 4 5
[f,] 5 4 3 2 1 0 1 2 3 4
[g,] 4 5 4 3 2 1 0 1 2 3
[h,] 3 4 5 4 3 2 1 0 1 2
[i,] 2 3 4 5 4 3 2 1 0 1
[j,] 1 2 3 4 5 4 3 2 1 0
then:
get.shortest.paths(g, 5, 1)
the answer is:
[[1]]
[1] 5 4 3 2
I want the node names not their numbers. Is there any solution? I checked vpath, too.
This does the trick for me:
paths <- get.shortest.paths(g, 5, 1)$vpath
names <- V(g)$name
lapply(paths, function(x) { names[x] })
There is a slightly simpler solution that does not use lapply:
paths <- get.shortest.paths(g, 5, 1)
V(g)$name[paths$vpath[[1]]]