Writing multivariate objective function, with matrix data entities - octave

S1=[20 32 44 56 68 80 92 104 116 128 140 152 164 176 188 200];
P=[16.82 26.93 37.01 47.1 57.21 67.32 77.41 87.5 97.54 107.7 117.8 127.9 138 148 158.2 168.3];
X = [0.119 0.191 0.262 0.334 0.405 0.477 0.548 0.620 0.691 0.763 0.835 0.906 0.978 1.049 1.120 1.192];
S = [2.3734 3.6058 5.0256 6.6854 8.6413 10.978 13.897 17.396 21.971 28.040 36.475 49.065 69.736 110.20 224.69 2779.1];
objective=#(x)((1250*x(3)*S(a)-(S(a)+x(2))*(P(a)+x(1)))/(1250*(S(a)+x(2))*(P(a)+x(1)))-x(5))^2+((x(2)*(P(a)^2+x(1)*P(a)))/(1250*x(4)*X(a)*x(3)-P(a)^2-x(1)*P(a))-S(a))^2+(74000/3*((X(a)*x(3)*S(a))/S1(a)*(S(a)+x(2)))-P(a))^2
%x0 = [Kp Ks mu.m Yp mu.d]
x0=[7.347705469 14.88611028 1.19747242 16.65696429 6.01E-03];
x=fminunc(objective,x0);
disp(x)
The code above is used for optimisizing the objective function, so that all the unknown values of the parameters can be found. As you may have seen, the objective function consists of 4 variables (S1, S, P, X), each having 16 data entities. My question is: how to create an objective function, so that all the data entities are utilised?
The final objective function has to be the sum of the objective function (shown above) with a=1:16. Any ideas?

Make the following changes to your code:
Replace, e.g. all S(a) variables with S to use the whole vector. Do the same for each of your four variables.
Convert all 'scalar' operations in your objective function to 'elementwise' ones, i.e. replace ^, * and / with .^, .* and ./. This produces 16 values, one for each index from 1 to 16 (i.e. what was previously referred to by a).
wrap the resulting expression into a sum() function to sum the 16 results into a final value
Use your optimiser as normal.
Resulting code:
S1 = [20 32 44 56 68 80 92 104 116 128 140 152 164 176 188 200];
P = [16.82 26.93 37.01 47.1 57.21 67.32 77.41 87.5 97.54 107.7 117.8 127.9 138 148 158.2 168.3];
X = [0.119 0.191 0.262 0.334 0.405 0.477 0.548 0.620 0.691 0.763 0.835 0.906 0.978 1.049 1.120 1.192];
S = [2.3734 3.6058 5.0256 6.6854 8.6413 10.978 13.897 17.396 21.971 28.040 36.475 49.065 69.736 110.20 224.69 2779.1];
objective = #(x) sum( ((1250.*x(3).*S-(S+x(2)).*(P+x(1)))./(1250.*(S+x(2)).*(P+x(1)))-x(5)).^2+((x(2).*(P.^2+x(1).*P))./(1250.*x(4).*X.*x(3)-P.^2-x(1).*P)-S).^2+(74000./3.*((X.*x(3).*S)./S1.*(S+x(2)))-P).^2 );
%x0 = [Kp Ks mu.m Yp mu.d]
x0 = [7.347705469 14.88611028 1.19747242 16.65696429 6.01E-03];
x = fminunc(objective,x0);
disp(x)
Note that you can make this code a lot clearer to read for humans; I just made the "direct" changes that illustrate conversion from your scalar expression to the desired vectorised one.

Related

Generate a stream of all possible numbers `x^k*y^l` for given `k` and `l` in Haskell

Generate the function generateExponents k l, which for given k and l generates a stream of all unique possible numbers x^k*y^l in increasing order. For example generateExponents 2 3 = [1,4,8,9,16,25,27...]
For obvious reasons this doesn't work:
generateExponents k l = sort [x^k*y^l | x <- [1..], y <- [1..]]
Then I tried this, which doesn't work either:
generateExponents k l = [n | n <- [1 ..], n `elem` products n]
where
xs n = takeWhile (\x -> x ^ k <= n) [1 ..]
ys n = takeWhile (\y -> y ^ l <= n) [1 ..]
products n = liftA2 (*) (xs n) (ys n)
What am I doing wrong?
Your algorithm is pretty slow -- it checks every number, and for every number it searches for an appropriate factorization! You can do better by producing an infinite table of answers, and then collapsing the table appropriately. For example, for x^2*y^3, the table looks like:
x 1 2 3 4 5
y
1 1 4 9 16 25
2 8 32 72 128 200
3 27 108 243 432 675
4 64 256 576 1024 1600
5 125 500 1125 2000 3125
Note two nice features of this table: each row is sorted, and the rows themselves are sorted. This means we can merge them efficiently by simply taking the top-left value, then re-inserting the tail of the first row in its new sorted position. For example, the table above, after emitting 1, would look like:
4 9 16 25 36
8 32 72 128 200
27 108 243 432 675
64 256 576 1024 1600
125 500 1125 2000 3125
Then, after emitting the top-left value 4:
8 32 72 128 200
9 16 25 36 49
27 108 243 432 675
64 256 576 1024 1600
125 500 1125 2000 3125
Note how the top row has now become the second row to keep the doubly-sorted property.
This is an efficient way to construct all the right numbers in the right order. Then, the only remaining trick needed is to deduplicate, and for that you can deploy the standard trick map head . group, since duplicates are guaranteed to be next to each other. Here's the full code:
import Data.List
generateExponents' k l = map head . group . go $ [[x^k*y^l | x <- [1..]] | y <- [1..]] where
go ((x:xs):xss) = x:go (insert xs xss)
It's much, much faster. Compare:
> sum . take 400 $ generateExponents 2 3
5994260
(8.26 secs, 23,596,249,112 bytes)
> sum . take 400 $ generateExponents' 2 3
5994260
(0.01 secs, 1,172,864 bytes)
> sum . take 1000000 {- a million -} $ generateExponents' 2 3
72001360441854395
(6.99 secs, 13,460,753,616 bytes)
I think you just forgot to map the actual function over the xs and ys:
generateExponents k l = [n | n <- [1 ..], n `elem` products n]
where
xs n = takeWhile (<= n) $ map (^ k) [1 ..]
ys n = takeWhile (<= n) $ map (^ l) [1 ..]
products n = liftA2 (*) (xs n) (ys n)

is possible to create a map with 2 keys and a vector of values in Clojure?

I am trying to create a program that reads in a table of temperatures from a csv file and would like to access a a collection of temperatures based on the year and day.
the first column stands for the year the tempratures have been recorded.
the second column stands for a specific day during each month .
the rest of the column represent the temperatures each month.
For example, 2021 - 23 - 119 = 23rd June 2021 has a temperature of 119
Year Day Months from January to December
2018 18 | 45 54 -11 170 99 166 173 177 175 93 74 69
2021 23 | 13 87 75 85 85 119 190 172 156 104 39 53
2020 23 | 63 86 62 128 131 187 163 162 138 104 60 70
So far I have managed to load the data from a CSV File with clojure.data.csv. this returns a sequence of vectors into the program
(defn Load_csv_file [filepath]
(try
(with-open [reader (io/reader filepath)]
(.skip reader 1)
( let [data (csv/read-csv reader)]
(println data) )
)
(catch Exception ex (println (str "LOL Exception: " (.toString ex))))
))
I am currently trying to figure out how to implement this but my reasoning was to create three keys in a map which will take in the year, day and vector of temperatures, to then filter for a specific value.
Any advice on how i can implement this functionality.
Thanks!
i would go with something like this:
(require '[clojure.java.io :refer [reader]]
'[clojure.string :refer [split blank?]]
'[clojure.edn :as edn])
(with-open [r (reader "data.txt")]
(doall (for [ln (rest (line-seq r))
:when (not (blank? ln))
:let [[y d & ms] (mapv edn/read-string (split ln #"\s+\|?\s+"))]]
{:year y :day d :months (vec ms)})))
;;({:year 2018,
;; :day 18,
;; :months [45 54 -11 170 99 166 173 177 175 93 74 69]}
;; {:year 2021,
;; :day 23,
;; :months [13 87 75 85 85 119 190 172 156 104 39 53]}
;; {:year 2020,
;; :day 23,
;; :months [63 86 62 128 131 187 163 162 138 104 60 70]})
by the way, i'm not sure csv format allows different separators (as you have in your example.. anyway this one would work for that)
I would create a map of data that looked something like this
{2020 {23 {:months [63 86 62 128 131 187 163 162 138 104 60 70]}}}
This way you can get the data out in a fairly easy way
(get-in data [2020 23 :months]
So something like this
(->> (Load_csv_file "file.csv")
(reduce (fn [acc [year day & months]] (assoc-in acc [year day] months)) {}))
This will result in the data structure I mentioned now you just need to figure out the location of the data you want

How to calculate the Hamming weight for a vector?

I am trying to calculate the Hamming weight of a vector in Matlab.
function Hamming_weight (vet_dec)
Ham_Weight = sum(dec2bin(vet_dec) == '1')
endfunction
The vector is:
Hamming_weight ([208 15 217 252 128 35 50 252 209 120 97 140 235 220 32 251])
However, this gives the following result, which is not what I want:
Ham_Weight =
10 10 9 9 9 5 5 7
I would be very grateful if you could help me please.
You are summing over the wrong dimension!
sum(dec2bin(vet_dec) == '1',2).'
ans =
3 4 5 6 1 3 3 6 4 4 3 3 6 5 1 7
dec2bin(vet_dec) creates a matrix like this:
11010000
00001111
11011001
11111100
10000000
00100011
00110010
11111100
11010001
01111000
01100001
10001100
11101011
11011100
00100000
11111011
As you can see, you're interested in the sum of each row, not each column. Use the second input argument to sum(x, 2), which specifies the dimension you want to sum along.
Note that this approach is horribly slow, as you can see from this question.
EDIT
For this to be a valid, and meaningful MATLAB function, you must change your function definition a bit.
function ham_weight = hamming_weight(vector) % Return the variable ham_weight
ham_weight = sum(dec2bin(vector) == '1', 2).'; % Don't transpose if
% you want a column vector
end % endfunction is not a MATLAB command.

standard unambiguos format [R] MySQL imported data

OK, to set the scene, I have written a function to import multiple tables from MySQL (using RODBC) and run randomForest() on them.
This function is run on multiple databases (as separate instances).
In one particular database, and one particular table, the "error in as.POSIXlt.character(x, tz,.....): character string not in a standard unambiguous format" error is thrown. The function runs on around 150 tables across two databases without any issues except this one table.
Here is a head() print from the table:
MQLTime bar5 bar4 bar3 bar2 bar1 pat1 baXRC
1 2014-11-05 23:35:00 184 24 8 24 67 147 Flat
2 2014-11-05 23:57:00 203 184 204 67 51 147 Flat
3 2014-11-06 00:40:00 179 309 49 189 75 19 Flat
4 2014-11-06 00:46:00 28 192 60 49 152 147 Flat
5 2014-11-06 01:20:00 309 48 9 11 24 19 Flat
6 2014-11-06 01:31:00 24 177 64 152 188 19 Flat
And here is the function:
GenerateRF <- function(db, countstable, RFcutoff) {
'load required libraries'
library(RODBC)
library(randomForest)
library(caret)
library(ff)
library(stringi)
'connection and data preparation'
connection <- odbcConnect ('TTODBC', uid='root', pwd='password', case="nochange")
'import count table and check if RF is allowed to be built'
query.str <- paste0 ('select * from ', db, '.', countstable, ' order by RowCount asc')
row.counts <- sqlQuery (connection, query.str)
'Operate only on tables that have >= RFcutoff'
for (i in 1:nrow (row.counts)) {
table.name <- as.character (row.counts[i,1])
col.count <- as.numeric (row.counts[i,2])
row.count <- as.numeric (row.counts[i,3])
if (row.count >= 20) {
'Delete old RFs and DFs for input pattern'
if (file.exists (paste0 (table.name, '_RF.Rdata'))) {
file.remove (paste0 (table.name, '_RF.Rdata'))
}
if (file.exists (paste0 (table.name, '_DF.Rdata'))) {
file.remove (paste0 (table.name, '_DF.Rdata'))
}
'import and clean data'
query.str2 <- paste0 ('select * from ', db, '.', table.name, ' order by mqltime asc')
raw.data <- sqlQuery(connection, query.str2)
'partition data into training/test sets'
set.seed(489)
index <- createDataPartition(raw.data$baXRC, p=0.66, list=FALSE, times=1)
data.train <- raw.data [index,]
data.test <- raw.data [-index,]
'find optimal trees to grow (without outcome and dates)
data.mtry <- as.data.frame (tuneRF (data.train [, c(-1,-col.count)], data.train$baXRC, ntreetry=100,
stepFactor=.5, improve=0.01, trace=TRUE, plot=TRUE, dobest=FALSE))
best.mtry <- data.mtry [which (data.mtry[,2] == min (data.mtry[,2])), 1]
'compress df'
data.ff <- as.ffdf (data.train)
'run RF. Originally set to 1000 trees but M1 dataset is to large for laptop. Maybe train at the lab?'
data.rf <- randomForest (baXRC~., data=data.ff[,-1], mtry=best.mtry, ntree=500, keep.forest=TRUE,
importance=TRUE, proximity=FALSE)
'generate and print variable importance plot'
varImpPlot (data.rf, main = table.name)
'predict on test data'
data.test.pred <- as.data.frame( predict (data.rf, data.test, type="prob"))
'get dates and name date column'
data.test.dates <- data.frame (data.test[,1])
colnames (data.test.dates) <- 'MQLTime'
'attach dates to prediction df'
data.test.res <- cbind (data.test.dates, data.test.pred)
'force date coercion to attempt negating unambiguous format error '
data.test.res$MQLTime <- format(data.test.res$MQLTime, format = "%Y-%m-%d %H:%M:%S")
'delete row names, coerce to dataframe, generate row table name and export outcomes to MySQL'
rownames (data.test.res)<-NULL
data.test.res <- as.data.frame (data.test.res)
root.table <- stri_sub(table.name, 0, -5)
sqlUpdate (connection, data.test.res, tablename = paste0(db, '.', root.table, '_outcome'), index = "MQLTime")
'save RF and test df/s for future use; save latest version of row_counts to MQL4 folder'
save (data.rf, file = paste0 ("C:/Users/user/Documents/RF_test2/", table.name, '_RF.Rdata'))
save (data.test, file = paste0 ("C:/Users/user/Documents/RF_test2/", table.name, '_DF.Rdata'))
write.table (row.counts, paste0("C:/Users/user/AppData/Roaming/MetaQuotes/Terminal/71FA4710ABEFC21F77A62A104A956F23/MQL4/Files/", db, "_m1_rowcounts.csv"), sep = ",", col.names = F,
row.names = F, quote = F)
'end of conditional block'
}
'end of for loop'
}
'close all connection to MySQL'
odbcCloseAll()
'clear workspace'
rm(list=ls())
'end of function'
}
At this line:
data.test.res$MQLTime <- format(data.test.res$MQLTime, format = "%Y-%m-%d %H:%M:%S")
I have tried coercing MQLTime using various functions including: as.character(), as.POSIXct(), as.POSIXlt(), as.Date(), format(), as.character(as.Date())
and have also tried:
"%y" vs "%Y" and "%OS" vs "%S"
All variants seem to have no effect on the error and the function is still able to run on all other tables. I have checked the table manually (which contains almost 1500 rows) and also in MySQL looking for NULL dates or dates like "0000-00-00 00:00:00".
Also, if I run the function line by line in R terminal, this offending table is processed without any problems which just confuses the hell out me.
I've exhausted all the functions/solutions I can think of (and also all those I could find through Dr. Google) so I am pleading for help here.
I should probably mention that the MQLTime column is stored as varchar() in MySQL. This was done to try and get around issues with type conversions between R and MySQL
SHOW VARIABLES LIKE "%version%";
innodb_version, 5.6.19
protocol_version, 10
slave_type_conversions,
version, 5.6.19
version_comment, MySQL Community Server (GPL)
version_compile_machine, x86
version_compile_os, Win32
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
Edit: Str() output on the data as imported from MySQl showing MQLTime is already in POSIXct format:
> str(raw.data)
'data.frame': 1472 obs. of 8 variables:
$ MQLTime: POSIXct, format: "2014-11-05 23:35:00" "2014-11-05 23:57:00" "2014-11-06 00:40:00" "2014-11-06 00:46:00" ...
$ bar5 : int 184 203 179 28 309 24 156 48 309 437 ...
$ bar4 : int 24 184 309 192 48 177 48 68 60 71 ...
$ bar3 : int 8 204 49 60 9 64 68 27 192 147 ...
$ bar2 : int 24 67 189 49 11 152 27 56 437 67 ...
$ bar1 : int 67 51 75 152 24 188 56 147 71 0 ...
$ pat1 : int 147 147 19 147 19 19 147 19 147 19 ...
$ baXRC : Factor w/ 3 levels "Down","Flat",..: 2 2 2 2 2 2 2 2 2 3 ...
So I have tried declaring stringsAsfactors = FALSE in the dataframe operations and this had no effect.
Interestingly, if the offending table is removed from processing through an additional conditional statement in the first 'if' block, the function stops on the table immediately preceeding the blocked table.
If both the original and the new offending tables are removed from processing, then the function stops on the table immediately prior to them. I have never seen this sort of behavior before and it really has me stumped.
I watched system resources during the function and they never seem to max out.
Could this be a problem with the 'for' loop and not necessarily date formats?
There appears to be some egg on my face. The table following the table where the function was stopping had a row with value '0000-00-00 00:00:00'. I added another statement in my MySQL function to remove these rows when pre-processing the tables. Thanks to those that had a look at this.

How to convert my binary (hex) data to latitude and longitude?

I have some binary data stream which passes geo location coordinates - latitude and longitude. I need to find the method they are encoded.
4adac812 = 74°26.2851' = 74.438085
2b6059f9 = 43°0.2763' = 43.004605
4adaee12 = 74°26.3003' = 74.438338
2a3c8df9 = 42°56.3177' = 42.938628
4ae86d11 = 74°40.1463' = 74.669105
2afd0efb = 42°59.6263' = 42.993772
1st value is hex value. 2nd & 3rd are values that I get in output (not sure which one is used in conversion).
I've found that first byte represents integer part of value (0x4a = 74). But I cannot find how decimal part is encoded.
I would really appreciate any help!
Thanks.
--
Upd: This stream comes from some "chinese" gps server software through tcp protocol. I have no sources or documentation for clent software. I suppose it was written in VC++6 and uses some standard implementations.
--
Upd: Here is packets I get:
Hex data:
41 00 00 00 13 bd b2 2c
4a e8 6d 11 2a 3c 8d f9
f6 0c ee 13
Log data in client soft:
[Lng] 74°40.1463', direction:1
[Lat] 42°56.3177', direction:1
[Head] direction:1006, speed:3318, AVA:1
[Time] 2011-02-25 19:52:19
Result data in client (UI):
74.669105
42.938628
Head 100 // floor(1006/10)
Speed 61.1 // floor(3318/54.3)
41 00 00 00 b1 bc b2 2c
4a da ee 12 2b 60 59 f9
00 00 bc 11
[Lng] 74°26.3003', direction:1
[Lat] 43°0.2763', direction:1
[Head] direction:444, speed:0, AVA:1
[Time] 2011-02-25 19:50:49
74.438338
43.004605
00 00 00 00 21 bd b2 2c
4a da c8 12 aa fd 0e fb
0d 0b e1 1d
[Lng] 74°26.2851', direction:1
[Lat] 42°59.6263', direction:1
[Head] direction:3553, speed:2829, AVA:1
[Time] 2011-02-25 19:52:33
74.438085
42.993772
I don't know what first 4 bytes mean.
I found the lower 7 bits of 5th byte represent the number of sec. (maybe 5-8 bits are time?)
Byte 9 represent integer of Lat.
Byte 13 is integer of Lng.
Bytes 17-18 reversed (word byte) is speed.
Bytes 19-20 reversed is ava(?) & direction (4 + 12 bits). (btw, somebody knows what ava is?)
And one note. In 3rd packet 13th byte you can see only lower 7 bits are used. I guess 1st bit doesnt mean smth (I removed it in the beginning, sorry if I'm wrong).
I have reordered your data so that we first have 3 longitures and then 3 latitudes:
74.438085, 74.438338, 74.669105, 43.004605, 42.938628, 42.993772
This is the best fit of the hexadecimals i can come up with is:
74.437368, 74.439881, 74.668392, 42.993224, 42.961388, 42.982391
The differences are: -0.000717, 0.001543, -0.000713, -0.011381, 0.022760, -0.011381
The program that generates these values from the complete Hex'es (4 not 3 bytes) is:
int main(int argc, char** argv) {
int a[] = { 0x4adac812, 0x4adaee12, 0x4ae86d11, 0x2b6059f9, 0x2a3c8df9, 0x2afd0efb };
int i = 0;
while(i<3) {
double b = (double)a[i] / (2<<(3*8)) * 8.668993 -250.0197;
printf("%f\n",b);
i++;
}
while(i<6) {
double b = (double)a[i] / (2<<(3*8)) * 0.05586007 +41.78172;
printf("%f\n",b);
i++;
}
printf("press key");
getch();
}
Brainstorming here.
If we look at the lower 6 bits of the second byte (data[1]&0x3f) we get the "minutes" value for most of the examples.
0xda & 0x3f = 0x1a = 26; // ok
0x60 & 0x3f = 0; // ok
0xe8 & 0x3f = 0x28 = 40; // ok
0x3c & 0x3f = 0x3c = 60; // should be 56
0xfd & 0x3f = 0x3d = 61; // should be 59
Perhaps this is the right direction?
I have tried your new data packets:
74+40.1463/60
74+26.3003/60
74+26.2851/60
42+56.3177/60
43+0.2763/60
42+59.6263/60
74.66910, 74.43834, 74.43809, 42.93863, 43.00460, 42.99377
My program gives:
74.668392, 74.439881, 74.437368, 42.961388, 42.993224, 39.407346
The differences are:
-0.000708, 0.001541, -0.000722, 0.022758, -0.011376, -3.586424
I re-used the 4 constants i derived from your first packet as those are probably stored in your client somewhere. The slight differences might be the result of some randomization the client does to prevent you from getting the exact value or reverse-engineering their protocol.