Better way to create dictionary of functions - function

#attempt 1: works
f(x::Int64) = x +1
my_functions = Dict("f" => f)
#attempt 2: does not work, something is wrong
new_functions = Dict("g" => g(x::Int64) = x + 5)
I'm a novice and new to Julia. Is there a way to accomplish this similar to my 2nd attempt above? Thanks

You can use anonymous function syntax like this:
new_functions = Dict("g" => x::Int64 -> x + 5)
You can read the details how they are used in the Julia manual: https://docs.julialang.org/en/latest/manual/functions/#man-anonymous-functions-1.
Edit: notice that if you initially add only one function to the dictionary its type will be too restrictive, like: Dict{String,getfield(Main, Symbol("##3#4"))}, e.g.:
julia> new_functions = Dict("g" => x::Int64 -> x + 5)
Dict{String,getfield(Main, Symbol("##15#16"))} with 1 entry:
"g" => ##15#16()
So you probably should specify the type explicitly like:
julia> new_functions = Dict{String, Function}("g" => x::Int64 -> x + 5)
Dict{String,Function} with 1 entry:
"g" => ##23#24()
or add at least two entries to the dictionary initially:
julia> new_functions = Dict("g" => x::Int64 -> x + 5, "h" => x -> x+1)
Dict{String,Function} with 2 entries:
"g" => ##11#13()
"h" => ##12#14()

For completeness: there's also the possibility to use the normal multi-line function syntax as an expression, which will create a function object with a name (like a "named function expression" in JavaScript; this is handy if you need recursion):
julia> Dict("g" => function g(x::Int); x + 5; end)
Dict{String,typeof(g)} with 1 entry:
"g" => g
The first ; in the line is necessary here. #BogumiƂ's caveats about typing the Dict apply as well, as you can see.
Using the short-form syntax is possible, too, but you have to put the expression into parentheses:
Dict("g" => (g(x::Int) = x + 5))

Related

Can prefix beam search commonly used in speech recognition with CTC be implemented in such a simpler way?

I am learning about speech recognition recently, and I have learned that the idea of prefix beam search is to merge paths with the same prefix, such as [1,1,_] and [_,1,_] (as you can see, _ indicates blank mark).
Based on this understanding, I implemented a version of mine, which can be simplified using pseudo code like this:
def prefix_beam_search(y, beam_size, blank):
seq_len, n_class = y.shape
logY = np.log(y)
beam = [([], 0)]
for t in range(seq_len):
buff = []
for prefix, p in beam:
for i in range(n_class):
new_prefix = list(prefix) + [i]
new_p = p + logY[t][i]
buff.append((new_prefix, new_p))
# merge the paths with same prefix'
new_beam = defaultdict(lambda: ninf)
for prefix, p in buff:
# 'norm_prefix' can simplify the path, [1,1,_,2] ==> [1,2]
# However, the ending 'blank' is retained, [1,1,_] ==> [1,_]
prefix = norm_prefix(prefix, blank)
new_beam[prefix] = logsumexp(new_beam[prefix], p)
# choose the best paths
new_beam = sorted(new_beam.items(), key=lambda x: x[1], reverse=True)
beam = new_beam[: beam_size]
return beam
But most of the versions I found online (according to the paper) are like this:
def _prefix_beam_decode(y, beam_size, blank):
T, V = y.shape
log_y = np.log(y)
beam = [(tuple(), (0, ninf))]
for t in range(T):
new_beam = defaultdict(lambda: (ninf, ninf))
for prefix, (p_b, p_nb) in beam:
for i in range(V):
p = log_y[t, i]
if i == blank:
new_p_b, new_p_nb = new_beam[prefix]
new_p_b = logsumexp(new_p_b, p_b + p, p_nb + p)
new_beam[prefix] = (new_p_b, new_p_nb)
continue
end_t = prefix[-1] if prefix else None
new_prefix = prefix + (i,)
new_p_b, new_p_nb = new_beam[new_prefix]
if i != end_t:
new_p_nb = logsumexp(new_p_nb, p_b + p, p_nb + p)
else:
new_p_nb = logsumexp(new_p_nb, p_b + p)
new_beam[new_prefix] = (new_p_b, new_p_nb)
if i == end_t:
new_p_b, new_p_nb = new_beam[prefix]
new_p_nb = logsumexp(new_p_nb, p_nb + p)
new_beam[prefix] = (new_p_b, new_p_nb)
beam = sorted(new_beam.items(), key=lambda x: logsumexp(*x[1]), reverse=True)
beam = beam[:beam_size]
return beam
The results of the two are different, and my version tends to return longer strings. And I don't quite understand the main two aspects:
Are there any details of my version that are not thoughtful?
The common version while generate new prefix by new_prefix = prefix + (i,) regardless of whether the end of the previous are the same as the given 's'. For example, the old prefix is [a,a,b] and when a new character s is added, both [a,a,b] and [a,a,b,b] are saved. What is the purpose if this? And does it cause double counting?
Looking forward to your answer, thanks in advance!
When you choose the best paths in your code, you don't want to differentiate between [1,_] and [1] since both correspond to the same prefix [1].
If you have for example:
[1], [1,_], [1,2]
then you want the probability of [1] and [1,_] both to have the sum of the two.
probability([1]) = probability([1])+probability([1,_])
probability([1,_]) = probability([1])+probability([1,_])
And after sorting with these probabilities, you may want to keep so many that the number of true prefixes is beam_size.
For example, you have [1], [1,_], [2], [3].
Of which probabilities are: 0.1, 0.08, 0.11, 0.15
Then the probabilities with which you want to sort them are:
0.18, 0.18, 0.11, 0.15, respectively (0.18 = 0.1 + 0.08)
Sorted: [1]:0.18, [1,_]: 0.18, [3]:0.15, [2]:0.11
And if you have beam_size 2, for example, then you may want to keep
[1], [1,_] and [3] so that you have 2 prefixes in your beam, because [1] and [1,_] count as the same prefix (as long as the next character is not 1 - that's why we keep track of [1] and [1,_] separately).

Tradingview custom function with security()

I'm looking at the Pine version 3 migration guide and there's an example showing a custom function and using security() to call that function. https://www.tradingview.com/wiki/Pine_Version_3_Migration_Guide
Now I tried to change my custom function in order to return two values instead of one, but for some reason it doesn't work anymore. The error is on line 10 ("[t,i] = ...") and says: variableType.itemType is not a function.
My code below, can anyone advise what the issue is please?
//#version=3
study("My Script")
calcS() =>
s = 0.0
j = 0.0
s := close + 1
j := close + 2
[s, j]
[t, i] = security(tickerid, '120', calcS())
plot(t, color=red, transp=0)
plot(i, color=blue, transp=0)
It's a known problem. You can't return tuple from security. It's in our plans to fix this bug.
Now you may use the following workaround:
//#version=3
study("My Script")
calcS() =>
s = 0.0
j = 0.0
s := close + 1
j := close + 2
[s, j]
calcSs() =>
[s, j] = calcS()
s
calcSj() =>
[s, j] = calcS()
j
t = security(tickerid, '120', calcSs())
i = security(tickerid, '120', calcSj())
plot(t, color=red, transp=0)
plot(i, color=blue, transp=0)
UPDATE
Starting with Pine v4 you can use functions returning tuples with security():
//#version=4
study("", "", true)
f() => [open, high]
[o, h] = security(syminfo.tickerid, "D", f())
plot(o)
plot(h)

How to find all maximum elements in a scalaz.Foldable container

scalaz.Foldable has a maximumBy method that finds a maximum element in a container. But is there an elegant way to find them all using scalaz? ie:
Vector(Person("Ben", 1), Person("Jil", 3), Person("Bob", 3)).maximumsBy(_.age)
== Vector(Person("Jil", 3), Person("Bob", 3))
I have a problem where, if there are several equal maximum values, I want to select among these candidates randomly.
You can do something like that
implicit def MaxNonEmptyListSemigroup[A : Order]:
Semigroup[NonEmptyList[A]] = new Semigroup[NonEmptyList[A]] {
def append(l1: NonEmptyList[A], l2: =>NonEmptyList[A]): NonEmptyList[A] =
Order[A].apply(l1.head, l2.head) match {
case GT => l1
case LT => l2
case EQ => l1 append l2
}
}
// returns None if the list is empty
// otherwise returns Some(non-empty-list of maximum elements)
list.foldMap1Opt(a => NonEmptyList.nels(a)) :: Option[NonEmptyList[A]]
Ideally, maximumsBy will return the maximums in the same type of container as was provided. To do this efficiently seems to require scalaz.Reducer, a typeclass that models append- and prepend- to a container.
import scalaz._
import Ordering._
import std.AllInstances._
object Maximums extends App {
def maximumsBy[F[_]: Foldable, A, B: Order](fa: F[A])(f: A => B)
(implicit r: Reducer[A, F[A]]): Option[F[A]] =
Foldable[F].foldMapLeft1Opt(fa)(a => (f(a), r.unit(a))) {
case (curr#(max, maxes), a) => {
val next = f(a)
Order[B].apply(next, max) match {
case GT => (next, r.unit(a))
case LT => curr
case EQ => (max, r.snoc(maxes, a))
}
}
}.map(_._2)
println(maximumsBy(Vector(("a", 1), ("c", 3), ("c", 3)))(_._2))
println(maximumsBy(List(("a", 1), ("c", 3), ("c", 3)))(_._2))
//Output:
//Some(Vector((c,3), (c,3)))
//Some(List((c,3), (c,3)))
}
I was slightly dismayed by how complex maximumsBy ended up. Are there any ways to simplify it, while keeping the same behavior?

Why JSArray parsing behaves differently, depending on the code structure, while logic remained?

I'm doing small refactoring, trying to keep logical outcomes intact:
After refactoring:
val mapped:Seq[Option[String]] = (mr.getNormalizedValue(1) \ "getapworkflowinfo1" ).as[JsArray].value.map(v => {
(v \ "Description").as[String] match {
case value if List("referral to electrophysiology").exists(value.toLowerCase.equals) =>
Some("true")
case _ =>
None
}}
)
mapped.flatten.lastOption
To:
val referralIndicators: Seq[Boolean] =
(mr.getNormalizedValue(1) \ "getapworkflowinfo1").as[JsArray].value
// Step 1.1 Extracting and checking description
.map(d => (d \ "Description").as[String].toLowerCase().equals("referral to electrophysiology"))
// Step 2. Returning if at least once there was referral to electrophysiology
Some(referralIndicators.exists(v => v)).map(v => v.toString)
Which should be logically equal (and there for should generate the same outputs on the same inputs).
Effectively improves parsing, and results returned in refactored code are better, then before.
Can someone explain, what is the different between those two?

HBase shell scan bytes to string conversion

I would like to scan hbase table and see integers as strings (not their binary representation). I can do the conversion but have no idea how to write scan statement by using Java API from hbase shell:
org.apache.hadoop.hbase.util.Bytes.toString(
"\x48\x65\x6c\x6c\x6f\x20\x48\x42\x61\x73\x65".to_java_bytes)
org.apache.hadoop.hbase.util.Bytes.toString("Hello HBase".to_java_bytes)
I will be very happy to have examples of scan, get that searching binary data (long's) and output normal strings. I am using hbase shell, not JAVA.
HBase stores data as byte arrays (untyped). Therefore if you perform a table scan data will be displayed in a common format (escaped hexadecimal string), e.g: "\x48\x65\x6c\x6c\x6f\x20\x48\x42\x61\x73\x65" -> Hello HBase
If you want to get back the typed value from the serialized byte array you have to do this manually.
You have the following options:
Java code (Bytes.toString(...))
hack the to_string function in $HBASE/HOME/lib/ruby/hbase/table.rb :
replace toStringBinary with toInt for non-meta tables
write a get/scan JRuby function which converts the byte array to the appropriate type
Since you want it HBase shell, then consider the last option:
Create a file get_result.rb :
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Result;
import java.util.ArrayList;
# Simple function equivalent to scan 'test', {COLUMNS => 'c:c2'}
def get_result()
htable = HTable.new(HBaseConfiguration.new, "test")
rs = htable.getScanner(Bytes.toBytes("c"), Bytes.toBytes("c2"))
output = ArrayList.new
output.add "ROW\t\t\t\t\t\tCOLUMN\+CELL"
rs.each { |r|
r.raw.each { |kv|
row = Bytes.toString(kv.getRow)
fam = Bytes.toString(kv.getFamily)
ql = Bytes.toString(kv.getQualifier)
ts = kv.getTimestamp
val = Bytes.toInt(kv.getValue)
output.add " #{row} \t\t\t\t\t\t column=#{fam}:#{ql}, timestamp=#{ts}, value=#{val}"
}
}
output.each {|line| puts "#{line}\n"}
end
load it in the HBase shell and use it:
require '/path/to/get_result'
get_result
Note: modify/enhance/fix the code according to your needs
Just for completeness' sake, it turns out that the call Bytes::toStringBinary gives the hex-escaped sequence you get in HBase shell:
\x0B\x2_SOME_ASCII_TEXT_\x10\x00...
Whereas, Bytes::toString will try to deserialize to a string assuming UTF8, which will look more like:
\u8900\u0710\u0115\u0320\u0000_SOME_UTF8_TEXT_\u4009...
you can add a scan_counter command to the hbase shell.
first:
add to /usr/lib/hbase/lib/ruby/hbase/table.rb (after the scan function):
#----------------------------------------------------------------------------------------------
# Scans whole table or a range of keys and returns rows matching specific criterias with values as number
def scan_counter(args = {})
unless args.kind_of?(Hash)
raise ArgumentError, "Arguments should be a hash. Failed to parse #{args.inspect}, #{args.class}"
end
limit = args.delete("LIMIT") || -1
maxlength = args.delete("MAXLENGTH") || -1
if args.any?
filter = args["FILTER"]
startrow = args["STARTROW"] || ''
stoprow = args["STOPROW"]
timestamp = args["TIMESTAMP"]
columns = args["COLUMNS"] || args["COLUMN"] || get_all_columns
cache = args["CACHE_BLOCKS"] || true
versions = args["VERSIONS"] || 1
timerange = args[TIMERANGE]
# Normalize column names
columns = [columns] if columns.class == String
unless columns.kind_of?(Array)
raise ArgumentError.new("COLUMNS must be specified as a String or an Array")
end
scan = if stoprow
org.apache.hadoop.hbase.client.Scan.new(startrow.to_java_bytes, stoprow.to_java_bytes)
else
org.apache.hadoop.hbase.client.Scan.new(startrow.to_java_bytes)
end
columns.each { |c| scan.addColumns(c) }
scan.setFilter(filter) if filter
scan.setTimeStamp(timestamp) if timestamp
scan.setCacheBlocks(cache)
scan.setMaxVersions(versions) if versions > 1
scan.setTimeRange(timerange[0], timerange[1]) if timerange
else
scan = org.apache.hadoop.hbase.client.Scan.new
end
# Start the scanner
scanner = #table.getScanner(scan)
count = 0
res = {}
iter = scanner.iterator
# Iterate results
while iter.hasNext
if limit > 0 && count >= limit
break
end
row = iter.next
key = org.apache.hadoop.hbase.util.Bytes::toStringBinary(row.getRow)
row.list.each do |kv|
family = String.from_java_bytes(kv.getFamily)
qualifier = org.apache.hadoop.hbase.util.Bytes::toStringBinary(kv.getQualifier)
column = "#{family}:#{qualifier}"
cell = to_string_scan_counter(column, kv, maxlength)
if block_given?
yield(key, "column=#{column}, #{cell}")
else
res[key] ||= {}
res[key][column] = cell
end
end
# One more row processed
count += 1
end
return ((block_given?) ? count : res)
end
#----------------------------------------------------------------------------------------
# Helper methods
# Returns a list of column names in the table
def get_all_columns
#table.table_descriptor.getFamilies.map do |family|
"#{family.getNameAsString}:"
end
end
# Checks if current table is one of the 'meta' tables
def is_meta_table?
tn = #table.table_name
org.apache.hadoop.hbase.util.Bytes.equals(tn, org.apache.hadoop.hbase.HConstants::META_TABLE_NAME) || org.apache.hadoop.hbase.util.Bytes.equals(tn, org.apache.hadoop.hbase.HConstants::ROOT_TABLE_NAME)
end
# Returns family and (when has it) qualifier for a column name
def parse_column_name(column)
split = org.apache.hadoop.hbase.KeyValue.parseColumn(column.to_java_bytes)
return split[0], (split.length > 1) ? split[1] : nil
end
# Make a String of the passed kv
# Intercept cells whose format we know such as the info:regioninfo in .META.
def to_string(column, kv, maxlength = -1)
if is_meta_table?
if column == 'info:regioninfo' or column == 'info:splitA' or column == 'info:splitB'
hri = org.apache.hadoop.hbase.util.Writables.getHRegionInfoOrNull(kv.getValue)
return "timestamp=%d, value=%s" % [kv.getTimestamp, hri.toString]
end
if column == 'info:serverstartcode'
if kv.getValue.length > 0
str_val = org.apache.hadoop.hbase.util.Bytes.toLong(kv.getValue)
else
str_val = org.apache.hadoop.hbase.util.Bytes.toStringBinary(kv.getValue)
end
return "timestamp=%d, value=%s" % [kv.getTimestamp, str_val]
end
end
val = "timestamp=#{kv.getTimestamp}, value=#{org.apache.hadoop.hbase.util.Bytes::toStringBinary(kv.getValue)}"
(maxlength != -1) ? val[0, maxlength] : val
end
def to_string_scan_counter(column, kv, maxlength = -1)
if is_meta_table?
if column == 'info:regioninfo' or column == 'info:splitA' or column == 'info:splitB'
hri = org.apache.hadoop.hbase.util.Writables.getHRegionInfoOrNull(kv.getValue)
return "timestamp=%d, value=%s" % [kv.getTimestamp, hri.toString]
end
if column == 'info:serverstartcode'
if kv.getValue.length > 0
str_val = org.apache.hadoop.hbase.util.Bytes.toLong(kv.getValue)
else
str_val = org.apache.hadoop.hbase.util.Bytes.toStringBinary(kv.getValue)
end
return "timestamp=%d, value=%s" % [kv.getTimestamp, str_val]
end
end
val = "timestamp=#{kv.getTimestamp}, value=#{org.apache.hadoop.hbase.util.Bytes::toLong(kv.getValue)}"
(maxlength != -1) ? val[0, maxlength] : val
end
second:
add to /usr/lib/hbase/lib/ruby/shell/commands/
the following file called: scan_counter.rb
#
# Copyright 2010 The Apache Software Foundation
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
module Shell
module Commands
class ScanCounter < Command
def help
return <<-EOF
Scan a table with cell value that is long; pass table name and optionally a dictionary of scanner
specifications. Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,
or COLUMNS. If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family:'.
Some examples:
hbase> scan_counter '.META.'
hbase> scan_counter '.META.', {COLUMNS => 'info:regioninfo'}
hbase> scan_counter 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
hbase> scan_counter 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
hbase> scan_counter 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false). By
default it is enabled. Examples:
hbase> scan_counter 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
EOF
end
def command(table, args = {})
now = Time.now
formatter.header(["ROW", "COLUMN+CELL"])
count = table(table).scan_counter(args) do |row, cells|
formatter.row([ row, cells ])
end
formatter.footer(now, count)
end
end
end
end
finally
add to /usr/lib/hbase/lib/ruby/shell.rb the function scan_counter.
replace the current function with this: (you can identify it by: 'DATA MANIPULATION COMMANDS',)
Shell.load_command_group(
'dml',
:full_name => 'DATA MANIPULATION COMMANDS',
:commands => %w[
count
delete
deleteall
get
get_counter
incr
put
scan
scan_counter
truncate
]
)