Generate a stream of all possible numbers `x^k*y^l` for given `k` and `l` in Haskell - function

Generate the function generateExponents k l, which for given k and l generates a stream of all unique possible numbers x^k*y^l in increasing order. For example generateExponents 2 3 = [1,4,8,9,16,25,27...]
For obvious reasons this doesn't work:
generateExponents k l = sort [x^k*y^l | x <- [1..], y <- [1..]]
Then I tried this, which doesn't work either:
generateExponents k l = [n | n <- [1 ..], n `elem` products n]
where
xs n = takeWhile (\x -> x ^ k <= n) [1 ..]
ys n = takeWhile (\y -> y ^ l <= n) [1 ..]
products n = liftA2 (*) (xs n) (ys n)
What am I doing wrong?

Your algorithm is pretty slow -- it checks every number, and for every number it searches for an appropriate factorization! You can do better by producing an infinite table of answers, and then collapsing the table appropriately. For example, for x^2*y^3, the table looks like:
x 1 2 3 4 5
y
1 1 4 9 16 25
2 8 32 72 128 200
3 27 108 243 432 675
4 64 256 576 1024 1600
5 125 500 1125 2000 3125
Note two nice features of this table: each row is sorted, and the rows themselves are sorted. This means we can merge them efficiently by simply taking the top-left value, then re-inserting the tail of the first row in its new sorted position. For example, the table above, after emitting 1, would look like:
4 9 16 25 36
8 32 72 128 200
27 108 243 432 675
64 256 576 1024 1600
125 500 1125 2000 3125
Then, after emitting the top-left value 4:
8 32 72 128 200
9 16 25 36 49
27 108 243 432 675
64 256 576 1024 1600
125 500 1125 2000 3125
Note how the top row has now become the second row to keep the doubly-sorted property.
This is an efficient way to construct all the right numbers in the right order. Then, the only remaining trick needed is to deduplicate, and for that you can deploy the standard trick map head . group, since duplicates are guaranteed to be next to each other. Here's the full code:
import Data.List
generateExponents' k l = map head . group . go $ [[x^k*y^l | x <- [1..]] | y <- [1..]] where
go ((x:xs):xss) = x:go (insert xs xss)
It's much, much faster. Compare:
> sum . take 400 $ generateExponents 2 3
5994260
(8.26 secs, 23,596,249,112 bytes)
> sum . take 400 $ generateExponents' 2 3
5994260
(0.01 secs, 1,172,864 bytes)
> sum . take 1000000 {- a million -} $ generateExponents' 2 3
72001360441854395
(6.99 secs, 13,460,753,616 bytes)

I think you just forgot to map the actual function over the xs and ys:
generateExponents k l = [n | n <- [1 ..], n `elem` products n]
where
xs n = takeWhile (<= n) $ map (^ k) [1 ..]
ys n = takeWhile (<= n) $ map (^ l) [1 ..]
products n = liftA2 (*) (xs n) (ys n)

Related

How to deduce left-hand side matrix from vector?

Suppose I have the following script, which constructs a symbolic array, A_known, and a symbolic vector x, and performs a matrix multiplication.
clc; clearvars
try
pkg load symbolic
catch
error('Symbolic package not available!');
end
syms V_l k s0 s_mean
N = 3;
% Generate left-hand-side square matrix
A_known = sym(zeros(N));
for hI = 1:N
A_known(hI, 1:hI) = exp(-(hI:-1:1)*k);
end
A_known = A_known./V_l;
% Generate x vector
x = sym('x', [N 1]);
x(1) = x(1) + s0*V_l;
% Matrix multiplication to give b vector
b = A_known*x
Suppose A_known was actually unknown. Is there a way to deduce it from b and x? If so, how?
Til now, I only had the case where x was unknown, which normally can be solved via x = b \ A.
Mathematically, it is possible to get a solution, but it actually has infinite solutions.
Example
A = magic(5);
x = (1:5)';
b = A*x;
A_sol = b*pinv(x);
which has
>> A
A =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
but solves A as A_sol like
>> A_sol
A_sol =
3.1818 6.3636 9.5455 12.7273 15.9091
3.4545 6.9091 10.3636 13.8182 17.2727
4.4545 8.9091 13.3636 17.8182 22.2727
3.4545 6.9091 10.3636 13.8182 17.2727
3.1818 6.3636 9.5455 12.7273 15.9091

Haskell function to return all powers of a number until a given maximum

I'm making a function that returns all the powers of n that are less than or equal to max. For example: powers 2 5 --> [1,2,4].
myPowers n = n : map (* n) (myPowers n)
powers :: Int -> Int -> [Int]
powers n max = takeWhile (< max) (myPowers n)
At the moment powers is returning too few numbers. For example: powers 2 6 should return 3 numbers, but is returning 2 numbers.
The problem is that your myPowers function starts with n, and not with 1. For example:
Prelude> take 10 $ myPowers 2
[2,4,8,16,32,64,128,256,512,1024]
We can fix this with:
myPowers n = 1 : map (* n) (myPowers n)
Note that you can make this computationally more efficient with iterate :: (a -> a) -> a -> [a], so you can define it as:
myPowers n = iterate (n*) 1
You should check the bound with <= if you want allow powers that are equal to the bounds:
powers n max = takeWhile (<= max) (myPowers n)
We then obtain sample output like:
Prelude> powers 2 5
[1,2,4]
Prelude> powers 2 6
[1,2,4]
Prelude> powers 2 10
[1,2,4,8]
Prelude> powers 3 10
[1,3,9]
Prelude> powers 3 30
[1,3,9,27]
Prelude> powers 5 30
[1,5,25]

Vectorization of feature scaling

I want to feature scale a matrix (X) with 2 columns. I am using mean normalization, and I wrote the following lines in Octave:
X_norm = X
mu = mean(X);
sigma = std(X);
X_norm(:,1) = (X_norm(:,1) .- mu(:,1)) ./ sigma(:,1);
X_norm(:,2) = (X_norm(:,2) .- mu(:,2)) ./ sigma(:,2);
Can you please let me know a cleaner way to vectorize these calculation?
I checked my code by comparing with the result from zscore(X) and they matched - i.e. a sum(X_norm - zscore(X)) returned me 0 0.
I am constrained to not use zscore(), and hence the question.
Sample data as follows:
2104 3
1600 3
2400 3
1416 2
3000 4
1985 4
1534 3
1427 3
1380 3
1494 3
1940 4
2000 3
1890 3
4478 5
1268 3
2300 4
1320 2
1236 3
2609 4
3031 4
1767 3
1888 2
1604 3
1962 4
3890 3
1100 3
1458 3
2526 3
2200 3
2637 3
You could simply do:
X_norm = (X .- mean(X,1)) ./ std(X,0,1);
During cross validation faced zero division issue.
This worked for me.
mu = mean(X);
X_norm = X - mu;
sigma = std(X);
% Skip zero div
sigmaZeroIdx = sigma == 0;
sigma(1,sigmaZeroIdx) = 1;
X_norm = X_norm ./ sigma;
I think you could apply a for loop for N size of features.
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
for iter = 1:num_iters;
mu(1,iter) = mean(X_norm(:,iter));
X_norm(:,iter) = X_norm(:,iter) .- mu(1,iter);
sigma(1,iter) = std(X_norm(:,iter));
X_norm(:,iter) = X_norm(:,iter) ./ mu(1,iter);
end

How to calculate the Hamming weight for a vector?

I am trying to calculate the Hamming weight of a vector in Matlab.
function Hamming_weight (vet_dec)
Ham_Weight = sum(dec2bin(vet_dec) == '1')
endfunction
The vector is:
Hamming_weight ([208 15 217 252 128 35 50 252 209 120 97 140 235 220 32 251])
However, this gives the following result, which is not what I want:
Ham_Weight =
10 10 9 9 9 5 5 7
I would be very grateful if you could help me please.
You are summing over the wrong dimension!
sum(dec2bin(vet_dec) == '1',2).'
ans =
3 4 5 6 1 3 3 6 4 4 3 3 6 5 1 7
dec2bin(vet_dec) creates a matrix like this:
11010000
00001111
11011001
11111100
10000000
00100011
00110010
11111100
11010001
01111000
01100001
10001100
11101011
11011100
00100000
11111011
As you can see, you're interested in the sum of each row, not each column. Use the second input argument to sum(x, 2), which specifies the dimension you want to sum along.
Note that this approach is horribly slow, as you can see from this question.
EDIT
For this to be a valid, and meaningful MATLAB function, you must change your function definition a bit.
function ham_weight = hamming_weight(vector) % Return the variable ham_weight
ham_weight = sum(dec2bin(vector) == '1', 2).'; % Don't transpose if
% you want a column vector
end % endfunction is not a MATLAB command.

Subsetting in a function to calculate a row total

I have a data frame with results for certain instruments, and I want to create a new column which contains the totals of each row. Because I have different numbers of instruments each time I run an analysis on new data, I need a function to dynamically calculate the new column with the Row Total.
To simply my problem, here’s what my data frame looks like:
Type Value
1 A 10
2 A 15
3 A 20
4 A 25
5 B 30
6 B 40
7 B 50
8 B 60
9 B 70
10 B 80
11 B 90
My goal is to achieve the following:
A B Total
1 10 30 40
2 15 40 55
3 20 50 70
4 25 60 85
5 70 70
6 80 80
7 90 90
I’ve tried various method, but this way holds the most promise:
myList <- list(a = c(10, 15, 20, 25), b = c(30, 40, 50, 60, 70, 80, 90))
tmpDF <- data.frame(sapply(myList, '[', 1:max(sapply(myList, length))))
> tmpDF
a b
1 10 30
2 15 40
3 20 50
4 25 60
5 NA 70
6 NA 80
7 NA 90
totalSum <- rowSums(tmpDF)
totalSum <- data.frame(totalSum)
tmpDF <- cbind(tmpDF, totalSum)
> tmpDF
a b totalSum
1 10 30 40
2 15 40 55
3 20 50 70
4 25 60 85
5 NA 70 NA
6 NA 80 NA
7 NA 90 NA
Even though this way did succeeded in combining two data frames of different lengths, the ‘rowSums’ function gives the wrong values in this example. Besides that, my original data isn't in a list format, so I can't apply such a 'solution'.
I think I’m overcomplicating this problem, so I was wondering how can I …
Subset data from a data frame on the basis of ‘Type’,
Insert these individual subsets of different lengths into a new data frame,
Add an ‘Total’ column to this data frame which is the correct sum of the
individual subsets.
An added complication to this problem is that this needs to be done in an function or in an otherwise dynamic way, so that I don’t need to manually subset the dozens of ‘Types’ (A, B, C, and so on) in my data frame.
Here’s what I have so far, which doesn’t work, but illustrates the lines I’m thinking along:
TotalDf <- function(x){
tmpNumberOfTypes <- c(levels(x$Type))
for( i in tmpNumberOfTypes){
subSetofData <- subset(x, Type = i, select = Value)
if( i == 1) {
totalDf <- subSetOfData }
else{
totalDf <- cbind(totalDf, subSetofData)}
}
return(totalDf)
}
Thanks in advance for any thoughts or ideas on this,
Regards,
EDIT:
Thanks to the comment of Joris (see below) I got an end in the right direction, however, when trying to translate his solution to my data frame, I run into additional problems. His proposed answer works, and gives me the following (correct) sum of the values of A and B:
> tmp78 <- tapply(DF$value,DF$id,sum)
> tmp78
1 2 3 4 5 6
6 8 10 12 9 10
> data.frame(tmp78)
tmp78
1 6
2 8
3 10
4 12
5 9
6 10
However, when I try this solution on my data frame, it doesn’t work:
> subSetOfData <- copyOfTradesList[c(1:3,11:13),c(1,10)]
> subSetOfData
Instrument AccountValue
1 JPM 6997
2 JPM 7261
3 JPM 7545
11 KFT 6992
12 KFT 6944
13 KFT 7069
> unlist(sapply(rle(subSetOfData$Instrument)$lengths,function(x) 1:x))
Error in rle(subSetOfData$Instrument) : 'x' must be an atomic vector
> subSetOfData$InstrumentNumeric <- as.numeric(subSetOfData$Instrument)
> unlist(sapply(rle(subSetOfData$InstrumentNumeric)$lengths,function(x) 1:x))
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
> subSetOfData$id <- unlist(sapply(rle(subSetOfData$InstrumentNumeric)$lengths,function(x) 1:x))
Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 2L, 3L, 1L, 2L, :
replacement has 3 rows, data has 6
I have the disturbing idea that I’m going around in circles…
Two thoughts :
1) you could use na.rm=T in rowSums
2) How do you know which one has to go with which? You might add some indexing.
eg :
DF <- data.frame(
type=c(rep("A",4),rep("B",6)),
value = 1:10,
stringsAsFactors=F
)
DF$id <- unlist(lapply(rle(DF$type)$lengths,function(x) 1:x))
Now this allows you to easily tapply the sum on the original dataframe
tapply(DF$value,DF$id,sum)
And, more importantly, get your dataframe in the correct form :
> DF
type value id
1 A 1 1
2 A 2 2
3 A 3 3
4 A 4 4
5 B 5 1
6 B 6 2
7 B 7 3
8 B 8 4
9 B 9 5
10 B 10 6
> library(reshape)
> cast(DF,id~type)
id A B
1 1 1 5
2 2 2 6
3 3 3 7
4 4 4 8
5 5 NA 9
6 6 NA 10
TV <- data.frame(Type = c("A","A","A","A","B","B","B","B","B","B","B")
, Value = c(10,15,20,25,30,40,50,60,70,80,90)
, stringsAsFactors = FALSE)
# Added Type C for testing
# TV <- data.frame(Type = c("A","A","A","A","B","B","B","B","B","B","B", "C", "C", "C")
# , Value = c(10,15,20,25,30,40,50,60,70,80,90, 100, 150, 130)
# , stringsAsFactors = FALSE)
lnType <- with(TV, tapply(Value, Type, length))
lnType <- as.integer(lnType)
lnType
id <- unlist(mapply(FUN = rep_len, length.out = lnType, x = list(1:max(lnType))))
(TV <- cbind(id, TV))
require(reshape2)
tvWide <- dcast(TV, id ~ Type)
# Alternatively
# tvWide <- reshape(data = TV, direction = "wide", timevar = "Type", ids = c(id, Type))
tvWide <- subset(tvWide, select = -id)
# If you want something neat without the <NA>
# for(i in 1:ncol(tvWide)){
#
# if (is.na(tvWide[j,i])){
# tvWide[j,i] = 0
# }
#
# }
# }
tvWide
transform(tvWide, rowSum=rowSums(tvWide, na.rm = TRUE))