How to append an integer to an existing binary - binary

Suppose I have a binary like this <<1,2,3,4>> and I want to modify it to <<1,2,3,4,5>>. How can I do it?
Also, I am trying to convert a list into a binary [1,2,3,4,5] => <<1,2,3,4,5>> but I'm not sure how to do that.

In Elixir, you can use the <> operator:
iex(1)> <<1, 2, 3, 4>> <> <<5>>
<<1, 2, 3, 4, 5>>
or the binary syntax:
iex(1)> a = <<1, 2, 3, 4>>
<<1, 2, 3, 4>>
iex(2)> <<a::binary, 5>>
<<1, 2, 3, 4, 5>>
In Erlang, use the binary syntax with /binary:
1> A = <<1, 2, 3, 4>>.
<<1,2,3,4>>
2> <<A/binary, 5>>.
<<1,2,3,4,5>>
Edit: to convert a list of bytes to a binary, use :erlang.list_to_binary/1:
iex(1)> :erlang.list_to_binary [1, 2, 3, 4, 5]
<<1, 2, 3, 4, 5>>

It heavily depends on what you want to append.
The most generic (although probably the most verbose) solution would be to go through the charlist:
<<1, 2, 3, 4>>
|> to_charlist()
|> Kernel.++([5])
|> to_string
#⇒ <<1, 2, 3, 4, 5>>
One might use Kernel.SpecialForms.for/1 comprehension:
for i <- [1, 2, 3, 4, 5], do: <<i>>, into: <<>>
#⇒ <<1, 2, 3, 4, 5>>
Also since it’s a charlist, one might use Kernel.to_string/1:
to_string [1, 2, 3, 4, 5]
#⇒ <<1, 2, 3, 4, 5>>

Here's what I found:
1> list_to_binary([<<"existing binary - ">>, integer_to_binary(1234, 10)]).
<<"existing binary - 1234">>

I was noticing that if the list contains values greater than 255 I am
getting an argument error(if erlang function) or changed values cause
default size of the binary elements is 1 byte I guess. How can I
overcome this ? Is this even possible ?
You can't represent the integer 257 in one byte, and its representation in two bytes is <<1, 1>>. Do you know why that is?
The string "257" and the integer 257 are two very different things. You aren't ever going to see: <<1, 2, 3, 257>> in elixir(or erlang) because each number between the commas is an integer that can fit in one byte. In addition, although an elixir string is a binary, a binary isn't necessarily an elixir string. Do you know how that is possible?
You seem to think that the binary <<1, 2, 3>> represents the characters "1", "2", and "3". That is incorrect. The binary <<1, 2, 3>> actually represents the characters named, Start of Heading, Start of Text, and End of Text. The character "1" is actually represented by the binary <<49>>:
iex(20)> <<49>>
"1"
Similarly, the character "2" is represented by the binary <<50>>, and the character "3" is represented by the binary <<51>>. There is no such thing as the character "257"--that is three characters "2", and "5", and "7".

Related

Difference between torch.Tensor([1,2,3]) and torch.tensor([1,2,3])?

I want to understand what is the significance of each function torch.Tensor([1,2,3]) and torch.tensor([1,2,3]).
The one difference I found is torch.Tensor() creates tensors with int64 dtype and torch.tensor() creates float32 dtype by default. Is there any other significant difference between both?
Are there any other differences between both apart from what I have mentioned above, Also, when and where to use which one?
It's exactely the other way around :)
torch.Tensor() returns a tensor that can hold 32-bit floating-point numbers as it is an alias for torch.FloatTensor.
torch.tensor(X) (with only integers in X) returns a 64-bit integer tensor by default as torch.tensor() infers the data type automatically.
But the initialization phase is really the only difference between the options. As torch.tensor() is a wrapper function to create a Tensor with pre-existing data. It is sometimes recommended to use torch.tensor() as it offers some possibilities to specify e.g. the data type by setting the dtype argument. On the other hand, to create a Tensor without data, you would need to use torch.Tensor(). Either way, in both cases you end up with a torch.Tensor.
print(torch.Tensor([1, 2, 3]).dtype) # torch.float32
print(torch.FloatTensor([1, 2, 3]).dtype) # torch.float32
print(torch.tensor([1, 2, 3], dtype=torch.float32).dtype) # torch.float32
print(torch.equal(torch.Tensor([1, 2, 3]), torch.FloatTensor([1, 2, 3]))) # True
print(torch.equal(torch.Tensor([1, 2, 3]), torch.tensor([1, 2, 3], dtype=torch.float32))) # True
print(torch.tensor([1, 2, 3]).dtype) # torch.int64
print(torch.LongTensor([1, 2, 3]).dtype) # torch.int64
print(torch.equal(torch.tensor([1, 2, 3]), torch.LongTensor([1, 2, 3]))) # True
print(torch.Tensor()) # tensor([])
print(torch.tensor()) # throws an error

Integrating Non-Observation Frame Data with Different Dimensionality in Reinforcement Learning

I am trying to understand a conceptual approach to integrating data into a stack of observation frames that don't have the same dimensionality as the frames.
Example Frame: [1, 2, 3]
Example extra data: [a, b]
Currently, I am approaching this as follows, with the example of 3 frames (rows) representing temporal observation data over 3 time periods, and a 4th frame (row) representing non-temporal data for which only the most recent observed values are needed.
Example:
[
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[a, b, NaN]
]
The a and b are the added data and the NaN is just a value added to match the dimensions of the existing data. Would there be differences (all inputs welcomed) in using NaN vs an outlier value like -1 that would never be observed by other measures?
One possible alternative would be to structure the observation data as such:
[
[1, 2, 3, a, b],
[4, 5, 6, a-1, b-1],
[7, 8, 9, a-2, b-3],
]
It seems this would be a noticeable increase in resources and the measures (in my context) of a and b can be universally understood as "bigger is better" or "smaller is better" without context from the other data values.

(Ecto.Query.CompileError) Tuples can only be used in comparisons with literal tuples of the same size. - Elixir

Where I'm at
For this example, consider Friends.repo
Table Person has fields :id, :name, :age
Example Ecto query:
iex> from(x in Friends.Person, where: {x.id, x.age} in [{1,10}, {2, 20}, {1, 30}], select: [:name])
When I run this, I get relevant results. Something like:
[
%{name: "abc"},
%{name: "xyz"}
]
But when I try to interpolate the query it throws the error
iex> list = [{1,10}, {2, 20}, {1, 30}]
iex> from(x in Friends.Person, where: {x.id, x.age} in ^list, select: [:name])
** (Ecto.Query.CompileError) Tuples can only be used in comparisons with literal tuples of the same size
I'm assuming I need to do some sort of type casting on the list variable. It is mentioned in the docs here : "When interpolating values, you may want to explicitly tell Ecto what is the expected type of the value being interpolated"
What I need
How do I achieve this for a complex type like this? How do I type cast for a "list of tuples, each of size 2"? Something like [{:integer, :integer}] doesn't seem to work.
If not the above, any alternatives for running a WHERE (col1, col2) in ((val1, val2), (val3, val4), ...) type of query using Ecto Query?
Unfortunately, the error should be treated as it is stated in the error message: only literal tuples are supported.
I was unable to come up with some more elegant and less fragile solution, but we always have a sledgehammer as the last resort. The idea would be to generate and execute the raw query.
list = [{1,10}, {2, 20}, {1, 30}]
#⇒ [{1, 10}, {2, 20}, {1, 30}]
values =
Enum.join(for({id, age} <- list, do: "(#{id}, #{age})"), ", ")
#⇒ "(1, 10), (2, 20), (1, 30)"
Repo.query(~s"""
SELECT name FROM persons
JOIN (VALUES #{values}) AS j(v_id, v_age)
ON id = v_id AND age = v_age
""")
The above should return the {:ok, %Postgrex.Result{}} tuple on success.
You can do it with a separate array for each field and unnest, which zips the arrays into rows with a column for each array:
ids =[ 1, 2, 1]
ages=[10, 20, 30]
from x in Friends.Person,
inner_join: j in fragment("SELECT distinct * from unnest(?::int[],?::int[]) AS j(id,age)", ^ids, ^ages),
on: x.id==j.id and x.age==j.age,
select: [:name]
another way of doing it is using json:
list = [%{id: 1, age: 10},
%{id: 2, age: 20},
%{id: 1, age: 30}]
from x in Friends.Person,
inner_join: j in fragment("SELECT distinct * from jsonb_to_recordset(?) AS j(id int,age int)", ^list),
on: x.id==j.id and x.age==j.age,
select: [:name]
Update: I now saw the tag mysql, the above was written for postgres, but maybe it can be used as a base for a mySql version.

Not sure why I'm failing test cases?

I'm building a function that replaces a limited number of old values with new values in a list (xs) and returns the replaced numbers as a new list (new_xs). I am failing most test cases. I have provided two examples of expected output and two examples of failing test cases.
Example:
Limit= None (means replace all old values) xs=[1,2,1,3,1,4,1,5,1] old= 1 new=2 --> new_xs[2,2,2,3,2,4,2,5,2]
Limit= 0 or negative means do not alter anything in the list
Limit=2 means only replace two old values and leave rest untouched.
Here is my non-working code:
def replace(xs, old, new, limit=None):
new_xs=[]
replacements=0
for num in xs:
if num == old and (limit is None or replacements<limit):
new_xs.append(new)
replacements+=1
else:
new_xs.append(num)
return new_xs
Still fails 6 tests:
AssertionError: [] != None
AssertionError: Lists differ: [9, 2, 9, 3, 9, 4, 9, 5, 9] != [-10, 2, -10, 3, -10, 4, -10, 5, -10]

Integer CSV Compression Algorithm

I did surface level research about the existance of an algorithm that compresses comma seperated integers however i did not find anything relevant.
My goal is to compress large amounts of structured comma separated integers whos value ranges are known. Is there a known algorithm to do such a thing? If not where would be a good start to read about some relevant areas of interest which will get me started on developing such algorithm? Ofcourse the algorithm has to be reversable and lossles such that i can uncompress the compressed data to retrieve the csv values.
The data structure is an array of three values, first number's domain is from 0 to 4, second is from 0 to 6, third is from 0 to n where n is not a large number. This structure is repeated to create data which is in a two dimensional array.
Using standard compression algorithms such as gzip or bzip2 on structured data does not yield optimum compression efficiency, therefore constructing a case specific algorithm did the trick.
The data structure is shown below with an example.
// cell: a data structure, array of three numbers
// digits[0]: { 0, 1, 2, 3, 4 }
// digits[1]: { 0, 1, 2, 3 }
// digits[2]: { 0, 1, 2, ..., n } n is not an absurdly large number
// Below it is reused in a multi-dimensional array.
var cells = [
[ [3, 0, 1], [4, 2, 4], [3, 0, 2], [4, 1, 3] ],
[ [4, 2, 3], [3, 0, 3], [4, 3, 3], [1, 1, 0] ],
[ [3, 3, 0], [2, 3, 1], [2, 2, 5], [0, 2, 4] ],
[ [2, 1, 0], [3, 0, 0], [0, 2, 3], [1, 0, 0] ]
];
I did various tests on this data structure (excluding the white-spaces as string) using standard compression algorithms:
gz compressed from 171 to 88 bytes
bzip2 compressed from 171 to 87 bytes
deflate compressed from 171 to 76 bytes
The algorithm I constructed compressed the data down to 33 bytes works up till n = 192. So on a case specific basis I was able to compress my data with more than double efficiency of standard text compression algorithms.
The way I achieved such compression is by mapping the possible values of all the different combinations which cells can hold to integers. If you want to investigate such a concept it's known as combinatorics in Mathematics. I then converted the base 10 integer into a higher base for string representation.
Since I am aiming for human usability (the compressed code will be typed) I used base 62 which I represented as {[0-9], [a-z], [A-Z]} from 0 to 61 respectively. I buffered the cell length when converted to Base62 to two digits. This allowed for 62*62 (3844) different cell combinations.
Finally, I added a base 62 digit at the beginning of the compressed string which represents the number of columns. When decompressing the y size is used to deduce the x size from the string's length. Thus the data can be correctly decompressed with no loss of data.
The compressed string of the above example looks like this:
var uncompressed = compress(cells); // "4n0w1H071c111h160i0B0O1s170308110"
I have provided an explanation of my method to solve my problem to help other facing a similar problem. I have not provided my code for obscurity reasons.
TL;DR
To compress structured data:
Represent discrete object as an integer
Encode the base 10 integer to a higher base
Repeat for all objects
Append number of rows or columns to the compressed string
To decompress structured data:
Read the rows or columns and deduce the other from the string length
Reverse steps 1 and 2 in compression
Repeat for all objects
Unless there's some specific structure to your list that you're not divulging and that might drastically help compression, standard lossless compression algorithms such a gzip or bzip2 should handle a string of numbers just fine.
Libraries for such common algorithms should be ubiquitously available for pretty much all languages and platforms.