Related
Let's say I've got a matrix with n columns, and I've got n different functions.
Is it possible to apply i-th function per each element in i-th column efficiently, that is without using loop?
For example for the following variables:
funs = #(x) [x, cos(x), x.^2]
A = [1 0 1
2 0 2
3 0 3
4 0 4] ;
I would like to obtain the following result:
B = [1 1 1
2 1 4
3 1 9
4 1 16] ;
without looping through columns...
What is the best method of detecting and dropping duplicate rows from an array in Julia?
x = Integer.(round.(10 .* rand(1000,4)))
# In R I would apply the duplicated function.
x = x[duplicated(x),:]
unique is what you are looking for: (this does not answer the question for the detection part.)
julia> x = Integer.(round.(10 .* rand(1000,4)))
1000×4 Array{Int64,2}:
7 3 10 1
7 4 8 9
7 7 3 0
3 4 8 2
⋮
julia> unique(x, 1)
973×4 Array{Int64,2}:
7 3 10 1
7 4 8 9
7 7 3 0
3 4 8 2
⋮
As for the detection part, a dirty fix would be editing this line:
#nref $N A d->d == dim ? sort!(uniquerows) : (indices(A, d))
to:
(#nref $N A d->d == dim ? sort!(uniquerows) : (indices(A, d))), uniquerows
Alternatively, you could define your own unique2 with abovementioned changes:
using Base.Cartesian
import Base.Prehashed
#generated function unique2(A::AbstractArray{T,N}, dim::Int) where {T,N}
......
end
julia> y, idx = unique2(x, 1)
julia> y
960×4 Array{Int64,2}:
8 3 1 5
8 3 1 6
1 1 0 1
8 10 1 10
9 1 8 7
⋮
julia> setdiff(1:1000, idx)
40-element Array{Int64,1}:
99
120
132
140
216
227
⋮
The benchmark on my machine is:
x = rand(1:10,1000,4) # 48 dups
#btime unique2($x, 1);
124.342 μs (31 allocations: 145.97 KiB)
#btime duplicated($x);
407.809 μs (9325 allocations: 394.78 KiB)
x = rand(1:4,1000,4) # 751 dups
#btime unique2($x, 1);
66.062 μs (25 allocations: 50.30 KiB)
#btime duplicated($x);
222.337 μs (4851 allocations: 237.88 KiB)
The result shows that the convoluted-metaprogramming-hashtable way in Base benefits a lot from lower memory allocation.
Julia v1.4 and above you would need to type unique(a, dims=1) where a is your N by 2 Array
julia> a=[2 2 ; 2 2; 1 2; 3 1]
4×2 Array{Int64,2}:
2 2
2 2
1 2
3 1
julia> unique(a,dims=1)
3×2 Array{Int64,2}:
2 2
1 2
3 1
You can also go with:
duplicated(x) = foldl(
(d,y)->(x[y,:] in d[1] ? (d[1],push!(d[2],y)) : (push!(d[1],x[y,:]),d[2])),
(Set(), Vector{Int}()),
1:size(x,1))[2]
This collects a set of seen rows, and outputs the indices of those already seen. This is essentially the minimal effort needed to get the result, so it should be fast.
julia> x = rand(1:2,5,2)
5×2 Array{Int64,2}:
2 1
1 2
1 2
1 1
1 1
julia> duplicated(x)
2-element Array{Int64,1}:
3
5
julia> x[duplicated(x),:]
2×2 Array{Int64,2}:
1 2
1 1
This is a followup question for this one: how to select/add a column to pandas dataframe based on a function of other columns?
have a data frame and I want to select the rows that match some criteria. The criteria is a function of values of other columns and some additional values.
Here is a toy example:
>> df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9],
'B': [randint(1,9) for x in xrange(9)],
'C': [4,10,3,5,4,5,3,7,1]})
>>
A B C
0 1 6 4
1 2 8 10
2 3 8 3
3 4 4 5
4 5 2 4
5 6 1 5
6 7 1 3
7 8 2 7
8 9 8 1
I want select all rows for which some non trivial function returns true, e.g. f(a,c,L), where L is a list of lists and f returns True iff a and c are not part of the same sublist.
That is, if L = [[1,2,3],[4,2,10],[8,7,5,6,9]] I want to get:
A B C
0 1 6 4
3 4 4 5
4 5 2 4
6 7 1 3
8 9 8 1
Thanks!
Here is a VERY VERY hacky and non-elegant solution. As another disclaimer, since your question doesn't state what you want to do if a number in the column is in none of the sub lists this code doesn't handle that in any real way besides any default functionality within isin().
import pandas as pd
df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9],
'B': [6,8,8,4,2,1,1,2,8],
'C': [4,10,3,5,4,5,3,7,1]})
L = [[1,2,3],[4,2,10],[8,7,5,6,9]]
df['passed1'] = df['A'].isin(L[0])
df['passed2'] = df['C'].isin(L[0])
df['1&2'] = (df['passed1'] ^ df['passed2'])
df['passed4'] = df['A'].isin(L[1])
df['passed5'] = df['C'].isin(L[1])
df['4&5'] = (df['passed4'] ^ df['passed5'])
df['passed7'] = df['A'].isin(L[2])
df['passed8'] = df['C'].isin(L[2])
df['7&8'] = (df['passed7'] ^ df['passed8'])
df['PASSED'] = df['1&2'] & df['4&5'] ^ df['7&8']
del df['passed1'], df['passed2'], df['1&2'], df['passed4'], df['passed5'], df['4&5'], df['passed7'], df['passed8'], df['7&8']
df = df[df['PASSED'] == True]
del df['PASSED']
With an output that looks like:
A B C
0 1 6 4
3 4 4 5
4 5 2 4
6 7 1 3
8 9 8 1
I implemented this rather quickly hence the utter and complete ugliness of this code, but I believe you can refactor it any way you would like (e.g. iterate over the original set of lists with for sub_list in L, improve variable names, come up with a better solution, etc).
Hope this helps. Oh, and did I mention this was hacky and not very good code? Because it is.
I need to write a lisp function that adds x to the nth item of a list. For example, (add 5 2 '(3 1 4 6 7)) returns (3 6 4 6 7).
choosing nthitem is
(defun nthitem (n list)
(cond ((equal n 1) (car list))
(t (nthitem (-n 1) (cdr list)))))
and adding x to a list is:
(defun addto (x list)
(cond ((null list) nil)
(t (cons (+ x (car list))
(addto x (cdr list))))))
But i cannot combine these two together.
We don't get to use nreconc enough. Here's a solution based on do and reconc. The idea is to walk down the list accumulating the elements from the list in reverse order until you get to the position of the element that you need to replace. Then you stick the bits together. That is, you reverse the list you've been accumulating, and attach it to a list built from the new element and the tail after that.
(defun add (number index list)
(do ((head '() (list* (first tail) head))
(tail list (rest tail))
(index index (1- index)))
((zerop index)
(nreconc head (list* (+ number (first tail))
(rest tail))))))
CL-USER> (add 5 2 '(3 1 4 6 7))
(3 1 9 6 7)
It's worth seeing how these values change over time. Let's consider an example with more numbers, and look at the value of head, tail, and index in each iteration:
CL-USER> (add 90 5 '(0 1 2 3 4 5 6 7 8 9))
(0 1 2 3 4 95 6 7 8 9)
head: ()
tail: (0 1 2 3 4 5 6 7 8 9)
index: 5
head: (0)
tail: (1 2 3 4 5 6 7 8 9)
index: 4
head: (1 0)
tail: (2 3 4 5 6 7 8 9)
index: 3
head: (2 1 0)
tail: (3 4 5 6 7 8 9)
index: 2
head: (3 2 1 0)
tail: (4 5 6 7 8 9)
index: 1
head: (4 3 2 1 0)
tail: (5 6 7 8 9)
index: 0
Once we get to 0, we can get the rest of the final result by adding number to (car tail) and putting it together with (cdr tail), i.e.,
(list* (+ (car tail) number) (cdr tail)
which produces
(95 6 7 8 9)
and then use nreconc to take (4 3 2 1 0) and (95 6 7 8 9) and get (0 1 2 3 4 95 6 7 8 9), i.e.,
(nreconc (list 4 3 2 1 0) '(95 6 7 8 9))
;=> (0 1 2 3 4 95 6 7 8 9)
Now, if for some reason you can't use do, e.g., this is a homework assignment, that trace should still give you enough information to write a direct recursive version of this with an accumulator. No matter what, though, you'll still need to be able to reverse (or nreverse) a list, and to append (or nconc) some lists together (or, combined, revappend or nreconc).
You have major formatting issues and you lack space between operators some places. Be sure to use an editor that does parenthesis matching like Emacs or Kate.
Just to show you how you combine those two without changing it from addto functionality
(defun addto (x n list)
(cond ((null list) nil)
(t (cons (+ x (car list))
(addto x (- n 1) (cdr list))))))
So your code should have two case in addition to the base case. One where (= n 1) since you start counting at 1 instead of 0 which is your default case today and one that doesn't add the car, just copy while doing the same recursion in the cdr. Good luck
Use setnth
(setq a (list 3 1 4 6 7))
(defun add-number-to-nth-element (arg liste element)
"Add ARG, a number, to nth ELEMENT of LISTE. "
(setnth element liste (+ arg (nth element liste)))
liste)
(add-number-to-nth-element 5 a 1)
;; ==> (3 6 4 6 7)
;; ==> (3 11 4 6 7)
;; ==> (3 16 4 6 7)
;; ==> (3 21 4 6 7)
;; ==> (3 26 4 6 7)
;; counting elements starts with 0
All you need is nth and setf:
emacs -Q, then evaluate the following:
(defun add-to-nth (x n ys)
(when ys (setf (nth n ys) (+ x (nth n ys)))))
(setq foobar '(1 2 3 4 5))
(add-to-nth 42 1 foobar)
C-h v foobar ; => (1 44 3 4 5)
Is there a way to pass a variable value in ddply/sapply directly to a function without the function (x) notation?
E.g. Instead of:
ddply(bu,.(trial), function (x) print(x$tangle) )
Is there a way to do:
ddply(bu,.(trial), print(tangle) )
I am asking because with many variables this notation becomes very cumbersome.
Thanks!
You can use fn$ in the gsubfn package. Just preface the function in question with fn$ and then you can use a formula notation as shown here:
> library(gsubfn)
>
> # instead of specifying function(x) mean(x) / sd(x)
>
> fn$sapply(iris[-5], ~ mean(x) / sd(x))
Sepal.Length Sepal.Width Petal.Length Petal.Width
7.056602 7.014384 2.128819 1.573438
> library(plyr)
> # instead of specifying function(x) colMeans(x[-5]) / sd(x[-5])
>
> fn$ddply(iris, .(Species), ~ colMeans(x[-5]) / sd(x[-5]))
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 14.20183 9.043319 8.418556 2.334285
2 versicolor 11.50006 8.827326 9.065547 6.705345
3 virginica 10.36045 9.221802 10.059890 7.376660
Just add your function parameters in the **ply command. For example:
ddply(my_data, c("var1","var2"), my_function, param1=something, param2=something)
where my_function usually looks like
my_function(x, param1, param2)
Here's a working example of this:
require(plyr)
n=1000
my_data = data.frame(
subject=1:n,
city=sample(1:4, n, T),
gender=sample(1:2, n, T),
income=sample(50:200, n, T)
)
my_function = function(data_in, dv, extra=F){
dv = data_in[,dv]
output = data.frame(mean=mean(dv), sd=sd(dv))
if(extra) output = cbind(output, data.frame(n=length(dv), se=sd(dv)/sqrt(length(dv)) ) )
return(output)
}
#with params
ddply(my_data, c("city", "gender"), my_function, dv="income", extra=T)
city gender mean sd n se
1 1 1 127.1158 44.64347 95 4.580324
2 1 2 125.0154 44.83492 130 3.932283
3 2 1 130.3178 41.00359 107 3.963967
4 2 2 128.1608 43.33454 143 3.623816
5 3 1 121.1419 45.02290 148 3.700859
6 3 2 120.1220 45.01031 123 4.058443
7 4 1 126.6769 38.33233 130 3.361968
8 4 2 125.6129 44.46168 124 3.992777
#without params
ddply(my_data, c("city", "gender"), my_function, dv="income", extra=F)
city gender mean sd
1 1 1 127.1158 44.64347
2 1 2 125.0154 44.83492
3 2 1 130.3178 41.00359
4 2 2 128.1608 43.33454
5 3 1 121.1419 45.02290
6 3 2 120.1220 45.01031
7 4 1 126.6769 38.33233
8 4 2 125.6129 44.46168