Apply function to entire dataframe - function

I am trying to recode all values in a dataframe using an if statement.
I have:
a b c
0 .05 0
-.02 0 -.06
-.01 0 -.08
0 0 .09
I want:
a b c
0 1 0
0 0 -1
0 0 -1
0 0 1
I have tried several things like:
def unit_weighted (x):
if x >= 0.05:
return 1
elif x <= -0.05:
return -1
else:
return 0
new = df.apply(unit_weighted, axis=0)
I get this error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I don't want to have to list all the columns like this:
new = df['a'].apply(unit_weighted, axis=0)
new = df['b'].apply(unit_weighted, axis=0)
Any help please?

Just remove the axis=0, and it will apply element-wise.

Related

How to find the nodes of a triangle from an adjacency matrix in Octave

I know how to find the number of triangles in an adjacency matrix.
tri = trace(A^3) / 6
But i require to find the nodes so that i can finally find the value of the edges from adjacency matrix since it's a sign graph. Is there already existing function which does that?
Taking the power of the adjacency matrix loses information about the intermediate nodes. Instead of a 2-dimensional matrix, we need 3 dimensions.
Given a graph:
and its adjacency matrix:
A =
0 0 0 0 1 1 0 1 0 0
0 0 0 1 0 1 0 0 0 0
0 0 0 1 0 0 0 1 0 1
0 1 1 0 1 0 1 0 0 0
1 0 0 1 0 0 1 0 0 0
1 1 0 0 0 0 0 1 1 0
0 0 0 1 1 0 0 0 1 0
1 0 1 0 0 1 0 0 0 0
0 0 0 0 0 1 1 0 0 0
0 0 1 0 0 0 0 0 0 0
Compute the 3d matrix T such that T(i,j,k) == 1 iff there is a path in the graph i=>j=>k=>i.
T = and(A, permute(A, [3 1 2]))
This is the equivalent of squaring the adjacency matrix, but keeping the path information. and is used here instead of multiplication in case A is a weighted adjacency matrix. If you sum along the 2nd dimension, you'll get A^2:
>> isequal(squeeze(sum(T,2)), A^2)
ans = 1
Now that we've got the paths of length 2, we just need to filter so we keep only the paths that return to their starting points.
T = and(T, permute(A.', [1 3 2])); % Transpose A in case graph is directed
Now, if T(i,j,k) == 1, then there is a triangle starting at node i, through nodes j and k and returning to node i. If you want to find all such paths:
[M,N,P] = ind2sub(size(T), find(T));
P = [M,N,P];
P will be a list of all triangular paths:
P =
8 6 1
6 8 1
7 5 4
5 7 4
7 4 5
4 7 5
8 1 6
1 8 6
5 4 7
4 5 7
6 1 8
1 6 8
In this case we get 12 paths. All paths in an undirected graph have 6 duplicates: one starting at each triangle point, times 2 directions. This gives the same results as trace:
>> trace(A^3)
ans = 12
If you want to remove the duplicates, the simplest way for triangles is to simply sort the vertex ordering and then take the unique rows of the list. This works for triangles only because all permutations of the nodes in the cycle are present. For longer cycles, this will not work.
P = unique(sort(P, 2), 'rows');
P =
1 6 8
4 5 7
Here is a solution using matrix multiplication:
C = (A * A.') & A;
[x, y] = find(tril(C));
n = numel(x);
D = sparse([x; y], [1:n 1:n].', 1, size(A,1), n);
[X, ~, V] = find(C * D);
tri = [x y X(V == 2)]
tri = unique(sort(tri, 2), 'rows');
First we need to know what are triangle nodes. Two nodes are triangle nodes if they have a common neighbor and both of them are neighbor of each other.
We take the definition to compute an adjacency matrix C that only contains triangle nodes and all other node are removed.
The expression A * A.' selects nodes that have common neighbors and the & A operator says that those nodes that have common neighbors should by neighbor of each other.
Now we can use [x, y] = find(tril(C)); to extract the first and the second points of each triangle as x and y respectively.
For the third node we need to find a node that has x and y as its neighbors. As before we can use the multiplication of boolean matrix trick to speed up the computation.
Finally the result tri has duplicates that should be remove using unique and sort.

Octave element wise comparisons [duplicate]

let us consider following code for impulse function
function y=impulse_function(n);
y=0;
if n==0
y=1;
end
end
this code
>> n=-2:2;
>> i=1:length(n);
>> f(i)=impulse_function(n(i));
>>
returns result
f
f =
0 0 0 0 0
while this code
>> n=-2:2;
>> for i=1:length(n);
f(i)=impulse_function(n(i));
end
>> f
f =
0 0 1 0 0
in both case i is 1 2 3 4 5,what is different?
Your function is not defined to handle vector input.
Modify your impluse function as follows:
function y=impulse_function(n)
[a b]=size(n);
y=zeros(a,b);
y(n==0)=1;
end
In your definition of impulse_function, whole array is compared to zero and return value is only a single number instead of a vector.
In the first case you are comparing an array to the value 0. This will give the result [0 0 1 0 0], which is not a simple true or false. So the statement y = 0; will not get executed and f will be [0 0 0 0 0] as shown.
In the second you are iterating through the array value by value and passing it to the function. Since the array contains the value 0, then you will get 1 back from the function in the print out of f (or [0 0 1 0 0], which is an impulse).
You'll need to modify your function to take array inputs.
Perhaps this example will clarify the issue further:
cond = 0;
if cond == 0
disp(cond) % This will print 0 since 0 == 0
end
cond = 1;
if cond == 0
disp(cond) % This won't print since since 1 ~= 0 (not equal)
end
cond = [-2 -1 0 1 2];
if cond == 0
disp(cond) % This won't print since since [-2 -1 0 1 2] ~= 0 (not equal)
end
You could define your impulse function simply as this one -
impulse_function = #(n) (1:numel(n)).*n==0
Sample run -
>> n = -6:4
n =
-6 -5 -4 -3 -2 -1 0 1 2 3 4
>> out = impulse_function(n)
out =
0 0 0 0 0 0 1 0 0 0 0
Plot code -
plot(n,out,'o') %// data points
hold on
line([0 0],[1 0]) %// impulse point
Plot result -
You can write an even simpler function:
function y=impulse_function(n);
y = n==0;
Note that this will return y as a type logical array but that should not affect later numerical computations.

How to apply countvectorizer to bigrams in a pandas dataframe

I'm trying to apply the countvectorizer to a dataframe containing bigrams to convert it into a frequency matrix showing the number of times each bigram appears in each row but I keep getting error messages.
This is what I tried using
cereal['bigrams'].head()
0 [(best, thing), (thing, I), (I, have),....
1 [(eat, it), (it, every), (every, morning),...
2 [(every, morning), (morning, my), (my, brother),...
3 [(I, have), (five, cartons), (cartons, lying),...
.........
bow = CountVectorizer(max_features=5000, ngram_range=(2,2))
train_bow = bow.fit_transform(cereal['bigrams'])
train_bow
Expected results
(best,thing) (thing, I) (I, have) (eat,it) (every,morning)....
0 1 1 1 0 0
1 0 0 0 1 1
2 0 0 0 0 1
3 0 0 1 0 0
....
I see you are trying to convert a pd.Series into a count representation of each term.
Thats a bit different from what CountVectorizer does;
From the function description:
Convert a collection of text documents to a matrix of token counts
The official example of case use is:
>>> from sklearn.feature_extraction.text import CountVectorizer
>>> corpus = [
... 'This is the first document.',
... 'This document is the second document.',
... 'And this is the third one.',
... 'Is this the first document?',
... ]
>>> vectorizer = CountVectorizer()
>>> X = vectorizer.fit_transform(corpus)
>>> print(vectorizer.get_feature_names())
['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
>>> print(X.toarray())
[[0 1 1 1 0 0 1 0 1]
[0 2 0 1 0 1 1 0 1]
[1 0 0 1 1 0 1 1 1]
[0 1 1 1 0 0 1 0 1]]
So, as one can see, it takes as input a list where each term is a "document".
Thats problaby the cause of the errors you are getting, you see, you are passing a pd.Series where each term is a list of tuples.
For you to use CountVectorizer you would have to transform your input into the proper format.
If you have the original corpus/text you can easily implement CountVectorizer on top of it (with the ngram parameter) to get the desired result.
Else, best solution wld be to treat it as it is, a series with a list of items, which must be counted/pivoted.
Sample workaround:
(it wld be a lot easier if you just use the text corpus instead)
Hope it helps!

Dynamic JSON editing in r

I am using the jsonlite package to work with a complex json file in r. To give a sense of its complexity, here is a look into two different data frames it contains
> output$dataobjects[1]$dataobject$scoredresults$x
xid rawscore tscore percentile standarderror omit
1 LAC.1 0 0 0 0 0
2 LAC.5 0 0 0 0 0
3 LAC.9 0 0 0 0 0
4 LAC.14 0 0 0 0 0
> output$dataobjects[1]$dataobject$scoredresults$y
yid rawscore tscore percentile standarderror omit
1 c1.1.1 0 0 0 0 0
2 c1.1.3 0 0 0 0 0
3 c1.1.5 0 0 0 0 0
4 c1.1.6 0 0 0 0 0
I would like to dynamically edit this json based on an external dataframe. Say the dataframe looks like this
id <- c("LAC.1", "LAC.5", "c1.1.1", "c1.1.5", "LAC.14")
rawscore <- c(15, 10, 12, 14, 15)
type <- c("x", "x", "y", "y", "x")
df <- data.frame(id, rawscore, type)
I want to use a for-loop to iterate over the rows in this dataframe, updating the rawscore column in the json as I go. I can't figure out how to move from x to y within the json while staying within the for-loop. Please help!

How to convert a 3 input AND gate into a NOR gate?

I know that I can say convert a 2-input AND gate into a NOR gate by simply inverting the two inputs because of DeMorgan's Theorem.
But how would you do the equivalent on a 3-input AND gate?
Say...
____
A___| \
B___| )___
C___|____ /
I'm trying to understand this because my homework asks me to take a circuit and convert it using NOR synthesis to only use nor gates, and I know how to do it with 2 input gates, but the gate with 3 inputs is throwing me for a spin.
DeMorgan's theorem for 2-input AND would produce:
AB
(AB)''
(A' + B')'
So, yes, the inputs are inverted and fed into a NOR gate.
DeMorgan's theorem for 3-input AND would similarly produce:
ABC
(ABC)''
(A' + B' + C')'
Which is, again, inputs inverted and fed into a (3-input) NOR gate:
___
A--O\ \
B--O ) )O---
C--O/___ /
#SailorChibi has truth tables that show equivalence.
If i haven't made any mistakes it is pretty much the same, invert all 3 of the inputs and you get a NOR
Table:
AND with inverted in is exact the same as
1 1 1 = 1
1 1 1 = 0
1 0 1 = 0
0 1 0 = 0
0 1 1 = 0
0 1 0 = 0
0 0 1 = 0
0 0 0 = 0
NOR with original input
0 0 0 = 1
0 0 1 = 0
0 1 0 = 0
1 0 1 = 0
1 0 0 = 0
1 0 1 = 0
1 1 0 = 0
1 1 1 = 0