Count non-symmetric bytes - language-agnostic

I am looking for a clean way to list the (8 bit) integers whose binary representation is not the same as another integer up to rotation and reflection.
For example the list will probably start as
0
1
(2=10b is skipped because you can rotate the bits in 1, therefore all powers of 2 are skipped. Also every number except 0 will be odd)
3=11b
5=101b
7=111b
9=1001b
11=1011b (so 13=1101b will be skipped because 11010000b is a reflection of 1101b which can then be rotated to the right 4 times )
.
.
.
Also ideally how could this be generalized to numbers with different numbers of bits, (16, 32, or just n) and other bases beside 2.

Since #John Smith thought my comment was a good answer, here it is an answer.
The answers here may be illuminating.

Thanks to Jeffromi for explaining the problem better -- I've deleted my previous answer.
Here's another solution in Perl. Perl is a good language for this sort of problem because it makes it easy to treat numbers as text and text as numbers.
i: for $i (0..255) {
$n1 = sprintf "%08b", $i; # binary representation of $i
$n2 = $n1; # "unreflected" copy of $n1
$n3 = reverse $n1; # "reflection" of $n1
for $j (1..8) {
$n2 = chop($n2) . $n2; # "rotate" $n2
$n3 = chop($n3) . $n3; # "rotate" $n3
next i if $found{$n2} or $found{$n3};
}
# if we get here, we rotated $n2 and $n3 8 times
# and didn't get a nonsymmetric byte that we've
# seen before -- this is a nonsymmetric byte
$found{$n1}++;
print "$i $n1\n";
}
This isn't as simple as the previous solution, but the jist is to try out all 16 combinations (2 reflections x 8 rotations) and compare them with all of the nonsymmetric bytes you've seen before.
There's a operator for bit shifting with rotation in Perl, but the chop($num) . $num idiom I used generalizes better to problems with base n.

You can use a sieve, similar to the sieve of Eratosthenes for prime numbers.
Use a bit array (BitSet in Java) with one bit for each number.
Initially mark all bits.
Go sequentially through the bit array until you find the next bit that is set at index n, this is a number in your set. Then clear the bits of all other numbers that can be reached from n via rotation and mirroring.
On today's machines this is feasible up to 32 bits, which would use 512MB of memory.

An alternative solution to Eratosthenes' Sieve would be to construct a test T(k) that returns True or False for any given k.
It would be slower, but this way no storage would be needed, so it would extend more readily to arbitrary data length.
If you simplify the problem for a moment, and say we are simply looking to discard reflections, then it would be easy:
T_ref(k) returns true iff k <= Reflection(k).
As for rotating bits, exactly the same can be done:
T_rot(k) returns true iff k == MIN{the set of all rotations of k}
You can think of dividing your integers up into a bunch of equivalence classes E(k) where E(k) is the set of all reflection&rotation permutations of k.
You might want to take a moment to satisfy yourself that the set of natural numbers N partitions itself readily into such disjoint subsets.
Then the set
{k s.t. k == MIN(E(k)) }
will guarantee to contain exactly one element from each equivalence class.
This would make a really nice interview question.

Related

How to read 3x3xN coordinates string into matlab array efficently

I have a MATLAB script that takes a JSON that was created by myself in a remote server and contains a long list of 3x3xN coordinates e.g. for N=1:
str = '[1,2,3.14],[4,5.66,7.8],[0,0,0],';
I want to avoid string splitting it, is there any approach to use strread or similar to read this 3×3×N tensor?
It's a multi-particle system and N can be large, though I have enough memory to store it all at once in the memory.
Any suggestion of how to format the array string in the JSON is very welcome as well.
If you can guarantee the format is always the same, I think it's easiest, safest and fastest to use sscanf:
fmt = '[%f,%f,%f],[%f,%f,%f],[%f,%f,%f],';
data = reshape(sscanf(str, fmt), 3, 3).';
Depending on the rest of your data (how is that "N" represented?), you might need to adjust that reshape/transpose.
EDIT
Based on your comment, I think this will solve your problem quite efficiently:
% Strip unneeded concatenation characters
str(str == ',') = ' ';
str(str == ']' | str == '[') = [];
% Reshape into workable dimensions
data = permute( reshape(sscanf(str, '%f '), 3,3,[]), [2 1 3]);
As noted by rahnema1, you can avoid the permute and/or character removal by adjusting your JSON generators to spit out the data column-major and without brackets, but you'll have to ask yourself these questions:
whether that is really worth the effort, considering that this code right here is already quite tiny and pretty efficient
whether other applications are going to use the JSON interface, because in essence you're de-generalizing the JSON output just to fit your processing script on the other end. I think that's a pretty bad design practice, but oh well.
Just something to keep in mind:
emitting 500k values in binary is about 34 MB
doing the same in ASCII is about 110 MB
Now depending a bit on your connection speed, I'd be getting really annoyed really quickly because every little test run takes about 3 times as long as it should be taking :)
So if an API call straight to the raw data is not possible, I would at least base64 that data in the JSON.
You can use eval function:
str = '[1,2,3.14],[4,5.66,7.8],[0,0,0],';
result=permute(reshape(eval(['[' ,str, ']']),3,3,[]),[2 1 3])
result =
1.00000 2.00000 3.14000
4.00000 5.66000 7.80000
0.00000 0.00000 0.00000
Using eval all elements concatenated to create a row vector. Then row vector reshaped to a 3d array. Since in MATLAB elements are placed in matrix columnwise it is required to permute the array so each 3*3 matrix are trasposed.
note1: There is no need to place [] in jSON string so you can use str2num instead of eval :
result=permute(reshape(str2num(str),3,3,[]),[2 1 3])
note2:
if you save data columnwise there is no need to permute:
str='1 4 0 2 5.66 0 3.14 7.8 0';
result=reshape(str2num(str),3,3,[])
Update: As Ander Biguri and excaza noted about security an speed issues related to eval and str2num and after Rody Oldenhuis 's suggestion about using sscanf I tested 3 methods in Octave:
a=num2str(rand(1,60000));
disp('-----SSCANF---------')
tic
sscanf(a,'%f ');
toc
disp('-----STR2NUM---------')
tic
str2num(a);
toc
disp('-----STRREAD---------')
tic
strread(a,'%f ');
toc
and here is the result:
-----SSCANF---------
Elapsed time is 0.0344398 seconds.
-----STR2NUM---------
Elapsed time is 0.142491 seconds.
-----STRREAD---------
Elapsed time is 0.515257 seconds.
So it is more secure and faster to use sscanf, in your case:
str='1 4 0 2 5.66 0 3.14 7.8 0';
result=reshape(sscanf(str,'%f '),3,3,[])
or
str='1, 4, 0, 2, 5.66, 0, 3.14, 7.8, 0';
result=reshape(sscanf(str,'%f,'),3,3,[])

Is there a "native" way to convert from numbers to dB in Tcl

dB or decibel is a unit that is used to show ratio in logarithmic scale, and specifecly, the definition of dB that I'm interested in is X(dB) = 20log(x) where x is the "normal" value, and X(dB) is the value in dB. When wrote a code converted between mil. and mm, I noticed that if I use the direct approach, i.e., multiplying by the ratio between the units, I got small errors on the opposite conversion, i.e.: to_mil [to_mm val_in_mil] wasn't equal to val_in_mil and the same with mm. The library units has solved this problem, as the conversions done by it do not have that calculation error. But the specifically doesn't offer (or I didn't find) the option to convert a number to dB in the library.
Is there another library / command that can transform numbers to dB and dB to numbers without calculation errors?
I did an experiment with using the direct math conversion, and I what I got is:
>> set a 0.005
0.005
>> set b [expr {20*log10($a)}]
-46.0205999133
>> expr {pow(10,($b/20))}
0.00499999999999
It's all a matter of precision. We often tend to forget that floating point numbers are not real numbers (in the mathematical sense of ℝ).
How many decimal digit do you need?
If you, for example, would only need 5 decimal digits, rounding 0.00499999999999 will give you 0.00500 which is what you wanted.
Since rounding fp numbers is not an easy task and may generate even more troubles, you might just change the way you determine if two numbers are equal:
>> set a 0.005
0.005
>> set b [expr {20*log10($a)}]
-46.0205999133
>> set c [expr {pow(10,($b/20))}]
0.00499999999999
>> expr {abs($a - $c) < 1E-10}
1
>> expr {abs($a - $c) < 1E-20}
0
>> expr {$a - $c}
8.673617379884035e-19
The numbers in your examples can be considered "equal" up to an error or 10-18. Note that this is just a rough estimate, not a full solution.
If you're really dealing with problems that are sensitive to numerical errors propagation you might look deeper into "numerical analysis". The article What Every Computer Scientist Should Know About Floating-Point Arithmetic or, even better, this site: http://floating-point-gui.de might be a start.
In case you need a larger precision you should drop your "native" requirement.
You may use the BigFloat offered by tcllib (http://tcllib.sourceforge.net/doc/bigfloat.html or even use GMP (the GNU multiple precision arithmetic library) through ffidl (http://elf.org/ffidl). There's an interface already defined for it: gmp.tcl
With the way floating point numbers are stored, every log10(...) can't correspond to exactly one pow(10, ...). So you lose precision, just like the integer divisions 89/7 and 88/7 both are 12.
When you put a value into floating point format, you should forget the ability to know it's exact value anymore unless you keep the old, exact value too. If you want exactly 1/200, store it as the integer 1 and the integer 200. If you want exactly the ten-logarithm of 1/200, store it as 1, 200 and the info that a ten-logarithm has been done on it.
You can fill your entire memory with the first x decimal digits of the square root of 2, but it still won't be the square root of 2 you store.

round number to 2 decimal places

I need to round a number to two decimal places.
Right now the following rounds to the nearest integer I guess
puts [expr {round($total_rate)}]
If I do something like below it does not work. Is there another way around?
puts [expr {round($total_rate,2)}]
The simplest way to round to a specific number of decimal places is with format:
puts [format "%.2f" $total_rate]
Be aware that if you're using the rounded value for further calculations instead of display to users, most values that you print using rounding to X decimal places will not have an exact representation in binary arithmetic (which Tcl uses internally, like vast numbers of other programming languages). It's best to reserve rounding to a specific number of DPs to the point where you're showing values to people.
expr {double(round(100*$total_rate))/100}
example
% set total_rate 1.5678
1.5678
% expr {double(round(100*$total_rate))/100}
1.57
% set total_rate 1.4321
1.4321
% expr {double(round(100*$total_rate))/100}
1.43
puts [format "%.2f" $total_rate]
By using format, we can see the result in output but how to use the same value in the program, i.e., we can see 1.448 as 1.45 in the output but can we use 1.45 in the program then.
It is unclear whether the original question "I need to round a number" really was "I need to print out a rounded-off value of a number". The latter is really best answered with a [format ...], but the former could be interpreted as a need for a number of significant digits, i.e. how to adjust the number itself, and not just to format the printout string. I think the only answer that serves this purpose so far is the elegant one Donal Fellows has provided. However, for "significant digits" instead of "digits after the decimal" I think a small modification is in order: get the number to be between 1 and 10 first (or between 0.1 and 1, if that is your convention), then trim the number of digits after the decimal. Without that, something like roundto(0.00000001234567,4) will get you a zero.
proc tcl::mathfunc::roundto {value sigfigs} {
set pow [expr ($sigfigs-1)-floor(log10($value))]
expr {round(10**$pow*$value)/10.0**$pow}
}
expr roundto(0.000000123456789,5)
produces a value rounded off to 5 significant figures:
1.2346e-7

What is the probability of collision with a 6 digit random alphanumeric code?

I'm using the following perl code to generate random alphanumeric strings (uppercase letters and numbers, only) to use as unique identifiers for records in my MySQL database. The database is likely to stay under 1,000,000 rows, but the absolute realistic maximum would be around 3,000,000. Do I have a dangerous chance of 2 records having the same random code, or is it likely to happen an insignificantly small number of times? I know very little about probability (if that isn't already abundantly clear from the nature of this question) and would love someone's input.
perl -le 'print map { ("A".."Z", 0..9)[rand 36] } 1..6'
Because of the Birthday Paradox it's more likely than you might think.
There are 2,176,782,336 possible codes, but even inserting just 50,000 rows there is already a quite high chance of a collision. For 1,000,000 rows it is almost inevitable that there will be many collisions (I think about 250 on average).
I ran a few tests and this is the number of codes I could generate before the first collision occurred:
73366
59307
79297
36909
Collisions will become more frequent as the number of codes increases.
Here was my test code (written in Python):
>>> import random
>>> codes = set()
>>> while 1:
code=''.join(random.choice('1234567890qwertyuiopasdfghjklzxcvbnm')for x in range(6))
if code in codes: break
codes.add(code)
>>> len(codes)
36909
Well, you have 36**6 possible codes, which is about 2 billion. Call this d. Using a formula found here, we find that the probability of a collision, for n codes, is approximately
1 - ((d-1)/d)**(n*(n-1)/2)
For any n over 50,000 or so, that's pretty high.
Looks like a 10-character code has a collision probability of only about 1/800. So go with 10 or more.
Based on the equations given at http://en.wikipedia.org/wiki/Birthday_paradox#Approximation_of_number_of_people, there is a 50% chance of encountering at least one collision after inserting only 55,000 records or so into a universe of this size:
http://wolfr.am/niaHIF
Trying to insert two to six times as many records will almost certainly lead to a collision. You'll need to assign codes nonrandomly, or use a larger code.
As mentioned previously, the birthday paradox makes this event quite likely. In particular, a accurate approximation can be determined when the problem is cast as a collision problem. Let p(n; d) be the probability that at least two numbers are the same, d be the number of combinations and n the number of trails. Then, we can show that p(n; d) is approximately equal to:
1 - ((d-1)/d)^(n*(n-1)/2)
We can easily plot this in R:
> d = 2176782336
> n = 1:100000
> plot(n,1 - ((d-1)/d)^(n*(n-1)/2), type='l')
which gives
As you can see the collision probability increases very quickly with the number of trials/rows
While I don't know the specifics of exactly how you want to use these pseudo-random IDs, you may want to consider generating an array of 3000000 integers (from 1 to 3000000) and randomly shuffling it. That would guarantee that the numbers are unique.
See Fisher-Yates shuffle on Wikipedia.
A caution: Beware of relying on the built-in rand where the quality of the pseudo random number generator matters. I recently found out about Math::Random::MT::Auto:
The Mersenne Twister is a fast pseudorandom number generator (PRNG) that is capable of providing large volumes (> 10^6004) of "high quality" pseudorandom data to applications that may exhaust available "truly" random data sources or system-provided PRNGs such as rand.
The module provides a drop in replacement for rand which is handy.
You can generate the sequence of keys with the following code:
#!/usr/bin/env perl
use warnings; use strict;
use Math::Random::MT::Auto qw( rand );
my $SEQUENCE_LENGTH = 1_000_000;
my %dict;
my $picks;
for my $i (1 .. $SEQUENCE_LENGTH) {
my $pick = pick_one();
$picks += 1;
redo if exists $dict{ $pick };
$dict{ $pick } = undef;
}
printf "Generated %d keys with %d picks\n", scalar keys %dict, $picks;
sub pick_one {
join '', map { ("A".."Z", 0..9)[rand 36] } 1..6;
}
Some time ago, I wrote about the limited range of built-in rand on Windows. You may not be on Windows, but there might be other limitations or pitfalls on your system.

How could I make this procedure more elegant?

I have a servo I'm controlling that is moving an object closer and closer to a sensor, trying to trigger it.
I want the distance to start at 15.5. However, in each iteration, I want it to decrease the distance .1, until the sensor triggers. For convenience sake, I'd like to exit the while loop with the variable $currentHeight set to this triggering height, so I've placed the decrement line at the beignning of the loop.
But, I've had to hardcode a 15.6 starting point before the while loop so that it will decrement in the first line of the loop to 15.5.
That doesn't seem elegant. Any suggestions on how to spruce this up?
By the way, this is Tcl for all you old school and obscure programmers. ;)
Code:
set currrentDistance 15.6
set sensorStatus 4
while {$sensorStatus == 1)} {
set currentDistance [expr $currentDistance - .1]
moveServo $currentHeight
set sensorStatus [watchSensor 2]
}
I'd use a for loop:
for {set d 155} {$d > 0} {incr d -1} {
set currentDistance [expr {$d * 0.1}]
moveServo $currentHeight
set sensorStatus [watchSensor 2]
# If we've found it, stop searching!
if {$sensorStatus == 1} break
}
This has the advantage of firstly having a limit against physical impossibility (no point in grinding the robot to pieces!) and secondly of doing the iteration with integers. That second point is vital: binary floating point numbers are tricky things, especially when it comes to iterating by 0.1, and Tcl (in common with many other languages) uses IEEE floating point arithmetic internally. The way to avoid those problems is to iterate with integers and have a bit of code to convert to floating point (e.g., by dividing by 10). Think in terms of dealing with counting down in units of 0.1. :-)
One other lesser stylistic point. Put {braces} round expressions as it boosts safety and performance. (The performance boost comes because the runtime knows it can't have weird expression fragments, which are also what would count as unsafe. Not that it is critical in this code because of the dependance on the servo hardware, but it's a good habit to get into.)
I don't know Tcl, but it could look something like this:
set currrentDistance 15.5
set sensorStatus 4
while {true} {
moveServo $currentHeight
set sensorStatus [watchSensor 2]
if {$sensorStatus == 1} then {break};
set currentDistance [expr $currentDistance - .1]
}