Convert HEX string to Unsigned INT (VBA) - ms-access

In MSACCESS VBA, I convert a HEX string to decimal by prefixing the string with "&h"
?CLng("&h1234")
4660
?CLng("&h80000000")
-2147483648
What should I do to convert it to an unsigned integer?
Using CDbl doesn't work either:
?CDbl("&h80000000")
-2147483648

Your version seems like the best answer, but can be shortened a bit:
Function Hex2Dbl(h As String) As Double
Hex2Dbl = CDbl("&h0" & h) ' Overflow Error if more than 2 ^ 64
If Hex2Dbl < 0 Then Hex2Dbl = Hex2Dbl + 4294967296# ' 16 ^ 8 = 4294967296
End Function
Double will have rounding precision error for most values above 2 ^ 53 - 1 (about 16 decimal digits), but Decimal can be used for values up to 16 ^ 12 - 1 (Decimal uses 16 bytes, but only 12 of them for the number)
Function Hex2Dec(h)
Dim L As Long: L = Len(h)
If L < 16 Then ' CDec results in Overflow error for hex numbers above 16 ^ 8
Hex2Dec = CDec("&h0" & h)
If Hex2Dec < 0 Then Hex2Dec = Hex2Dec + 4294967296# ' 2 ^ 32
ElseIf L < 25 Then
Hex2Dec = Hex2Dec(Left$(h, L - 9)) * 68719476736# + CDec("&h" & Right$(h, 9)) ' 16 ^ 9 = 68719476736
End If
End Function

If you want to go higher than 2^31 you could use Decimal or LongLong. LongLong and CLngLngonly work on 64bit platforms though. Since I only have 32 bit office at the moment, this is for Decimal and CDec.
There seems to be an issue when converting 8-digit Hex numbers because apparently signed 32-bit is used somewhere in the process which results in the sign mistake even though Decimal could handle the number.
'only for positive numbers
Function myHex2Dec(hexString As String) As Variant
'cut off "&h" if present
If Left(hexString, 2) = "&h" Or Left(hexString, 2) = "&H" Then hexString = Mid(hexString, 3)
'cut off leading zeros
While Left(hexString, 1) = "0"
hexString = Mid(hexString, 2)
Wend
myHex2Dec = CDec("&h" & hexString)
'correct value for 8 digits onle
If myHex2Dec < 0 And Len(hexString) = 8 Then
myHex2Dec = CDec("&h1" & hexString) - 4294967296#
'cause overflow for 16 digits
ElseIf myHex2Dec < 0 Then
Error (6) 'overflow
End If
End Function
Test:
Sub test()
Dim v As Variant
v = CDec("&H80000000") '-2147483648
v = myHex2Dec("&H80000000") '2147483648
v = CDec("&H7FFFFFFFFFFFFFFF") '9223372036854775807
v = myHex2Dec("&H7FFFFFFFFFFFFFFF") '9223372036854775807
v = CDec("&H8000000000000000") '-9223372036854775808
v = myHex2Dec("&H8000000000000000") 'overflow
End Sub

With remark of #arcadeprecinct I was able to create a function for it:
Function Hex2UInt(h As String) As Double
Dim dbl As Double: dbl = CDbl("&h" & h)
If dbl < 0 Then
dbl = CDbl("&h1" & h) - 4294967296#
End If
Hex2UInt = dbl
End Function
Some example output:
?Hex2UInt("1234")
4660
?Hex2UInt("80000000")
2147483648
?Hex2UInt("FFFFFFFFFFFF")
281474976710655
Maximum value to represent as an integer is 0x38D7EA4C67FFF
?Hex2UInt("38D7EA4C67FFF")
999999999999999
?Hex2UInt("38D7EA4C68000")
1E+15

a proposal, result in h
sh = "&H80000000"
h = CDbl(sh)
If h < 0 Then
fd = Hex$(CDbl(Left(sh, 3)) - 8)
sh = "&h" & fd & Mid(sh, 4)
h = CDbl(sh) + 2 ^ 31
End If

I found my way here looking for a Word VBA solution, but what I've discovered might also apply to other Office apps. I realise that this is a very old question and that there are some ingenious solutions to it, but I'm surprised that nobody has explained what it is that seems to be the root cause of the problem, and hence what might possibly be a one-line solution in many cases. When I was an assembly language programmer in the 1970s, working more in binary and octal than anything else, this was a very common issue, known as "2s complement".
I'll explain it in its simplest form, from first principles, by the way it works on a byte, so that it's understandable even by absolute beginners.
Normally, the most significant bit is bit-7 at the left which has a value of 128, the least significant bit is bit-0 at the right which has a value of 1. Therefore, the highest possible value if all bits are set is 255. However in 2s complement, bit-7 is the "sign bit". This only leaves the seven bits from 0 to 6 to hold the actual value, giving them a maximum value of 127. The sign bit has a value of -128. If all 8 bits are set, the byte value becomes (-128 + 127) which gives the negative decimal value of -1. The 2s complement range of values for 8 bits is from -128 (with only bit-7 set) to +127 (with only bits 0 to 6 set). If the sign bit is set, the value of the byte is -128 plus the positive value of whatever is stored in bits 0 to 6. E.g. binary 11111101 = hex FD = decimal (-128 + 125) = -3, 10110100 = hex B4 = decimal (-128 + 52) = -76.
2s complement applies the same effect at each increasing 8-bit boundary, thus for 16 bits, the sign bit is bit-15 (with a value of -32,768) and the positive value is in bits 0 to 14, giving a 16-bit range of values from -32768 to 32767. Similarly, the 24-bit range is from -8388608 to 8388607, and so on.
I recently encountered this conversion problem in some code that was converting hexadecimal RGB colour values which originated as a 6-character text string in a Word document. Having successfully processed tens of thousands of these I was suddeny presented with an "out of range" error pop-up. The string that had caused the problem was "008080". The command ... = Val("&H" + variable) had converted this to -32896, an invalid value to pass as a colour property. The Val() function had removed the leading zeros and treated 8080 as a signed 2s complement 16-bit value.
In my case the solution was simple. Because I know that I'll always be dealing with 24-bit, 6-character hex values. I just added an extra "1" text character to the front of the hex code (thus making it longer than 16 bits), then, in effect, subtracted the same value. So, with the original 6-character hex RGB code held in the variable HexCode, I get the right decimal result using the command
DecCode = Val("&H" + "1" + HexCode) - Val("&H" + "1000000")
Problem solved, by just adding a little extra code to an existing line. I hope that my explanation of the cause of the problem helps others to devise their own solutions where it's appropriate.

Related

Fetch request integers change after applying .json() [duplicate]

Is this defined by the language? Is there a defined maximum? Is it different in different browsers?
JavaScript has two number types: Number and BigInt.
The most frequently-used number type, Number, is a 64-bit floating point IEEE 754 number.
The largest exact integral value of this type is Number.MAX_SAFE_INTEGER, which is:
253-1, or
+/- 9,007,199,254,740,991, or
nine quadrillion seven trillion one hundred ninety-nine billion two hundred fifty-four million seven hundred forty thousand nine hundred ninety-one
To put this in perspective: one quadrillion bytes is a petabyte (or one thousand terabytes).
"Safe" in this context refers to the ability to represent integers exactly and to correctly compare them.
From the spec:
Note that all the positive and negative integers whose magnitude is no
greater than 253 are representable in the Number type (indeed, the
integer 0 has two representations, +0 and -0).
To safely use integers larger than this, you need to use BigInt, which has no upper bound.
Note that the bitwise operators and shift operators operate on 32-bit integers, so in that case, the max safe integer is 231-1, or 2,147,483,647.
const log = console.log
var x = 9007199254740992
var y = -x
log(x == x + 1) // true !
log(y == y - 1) // also true !
// Arithmetic operators work, but bitwise/shifts only operate on int32:
log(x / 2) // 4503599627370496
log(x >> 1) // 0
log(x | 1) // 1
Technical note on the subject of the number 9,007,199,254,740,992: There is an exact IEEE-754 representation of this value, and you can assign and read this value from a variable, so for very carefully chosen applications in the domain of integers less than or equal to this value, you could treat this as a maximum value.
In the general case, you must treat this IEEE-754 value as inexact, because it is ambiguous whether it is encoding the logical value 9,007,199,254,740,992 or 9,007,199,254,740,993.
>= ES6:
Number.MIN_SAFE_INTEGER;
Number.MAX_SAFE_INTEGER;
<= ES5
From the reference:
Number.MAX_VALUE;
Number.MIN_VALUE;
console.log('MIN_VALUE', Number.MIN_VALUE);
console.log('MAX_VALUE', Number.MAX_VALUE);
console.log('MIN_SAFE_INTEGER', Number.MIN_SAFE_INTEGER); //ES6
console.log('MAX_SAFE_INTEGER', Number.MAX_SAFE_INTEGER); //ES6
It is 253 == 9 007 199 254 740 992. This is because Numbers are stored as floating-point in a 52-bit mantissa.
The min value is -253.
This makes some fun things happening
Math.pow(2, 53) == Math.pow(2, 53) + 1
>> true
And can also be dangerous :)
var MAX_INT = Math.pow(2, 53); // 9 007 199 254 740 992
for (var i = MAX_INT; i < MAX_INT + 2; ++i) {
// infinite loop
}
Further reading: http://blog.vjeux.com/2010/javascript/javascript-max_int-number-limits.html
In JavaScript, there is a number called Infinity.
Examples:
(Infinity>100)
=> true
// Also worth noting
Infinity - 1 == Infinity
=> true
Math.pow(2,1024) === Infinity
=> true
This may be sufficient for some questions regarding this topic.
Jimmy's answer correctly represents the continuous JavaScript integer spectrum as -9007199254740992 to 9007199254740992 inclusive (sorry 9007199254740993, you might think you are 9007199254740993, but you are wrong!
Demonstration below or in jsfiddle).
console.log(9007199254740993);
However, there is no answer that finds/proves this programatically (other than the one CoolAJ86 alluded to in his answer that would finish in 28.56 years ;), so here's a slightly more efficient way to do that (to be precise, it's more efficient by about 28.559999999968312 years :), along with a test fiddle:
/**
* Checks if adding/subtracting one to/from a number yields the correct result.
*
* #param number The number to test
* #return true if you can add/subtract 1, false otherwise.
*/
var canAddSubtractOneFromNumber = function(number) {
var numMinusOne = number - 1;
var numPlusOne = number + 1;
return ((number - numMinusOne) === 1) && ((number - numPlusOne) === -1);
}
//Find the highest number
var highestNumber = 3; //Start with an integer 1 or higher
//Get a number higher than the valid integer range
while (canAddSubtractOneFromNumber(highestNumber)) {
highestNumber *= 2;
}
//Find the lowest number you can't add/subtract 1 from
var numToSubtract = highestNumber / 4;
while (numToSubtract >= 1) {
while (!canAddSubtractOneFromNumber(highestNumber - numToSubtract)) {
highestNumber = highestNumber - numToSubtract;
}
numToSubtract /= 2;
}
//And there was much rejoicing. Yay.
console.log('HighestNumber = ' + highestNumber);
Many earlier answers have shown 9007199254740992 === 9007199254740992 + 1 is true to verify that 9,007,199,254,740,991 is the maximum and safe integer.
But what if we keep doing accumulation:
input: 9007199254740992 + 1 output: 9007199254740992 // expected: 9007199254740993
input: 9007199254740992 + 2 output: 9007199254740994 // expected: 9007199254740994
input: 9007199254740992 + 3 output: 9007199254740996 // expected: 9007199254740995
input: 9007199254740992 + 4 output: 9007199254740996 // expected: 9007199254740996
We can see that among numbers greater than 9,007,199,254,740,992, only even numbers are representable.
It's an entry to explain how the double-precision 64-bit binary format works. Let's see how 9,007,199,254,740,992 be held (represented) by using this binary format.
Using a brief version to demonstrate it from 4,503,599,627,370,496:
1 . 0000 ---- 0000 * 2^52 => 1 0000 ---- 0000.
|-- 52 bits --| |exponent part| |-- 52 bits --|
On the left side of the arrow, we have bit value 1, and an adjacent radix point. By consuming the exponent part on the left, the radix point is moved 52 steps to the right. The radix point ends up at the end, and we get 4503599627370496 in pure binary.
Now let's keep incrementing the fraction part with 1 until all the bits are set to 1, which equals 9,007,199,254,740,991 in decimal.
1 . 0000 ---- 0000 * 2^52 => 1 0000 ---- 0000.
(+1)
1 . 0000 ---- 0001 * 2^52 => 1 0000 ---- 0001.
(+1)
1 . 0000 ---- 0010 * 2^52 => 1 0000 ---- 0010.
(+1)
.
.
.
1 . 1111 ---- 1111 * 2^52 => 1 1111 ---- 1111.
Because the 64-bit double-precision format strictly allots 52 bits for the fraction part, no more bits are available if we add another 1, so what we can do is setting all bits back to 0, and manipulate the exponent part:
┏━━▶ This bit is implicit and persistent.
┃
1 . 1111 ---- 1111 * 2^52 => 1 1111 ---- 1111.
|-- 52 bits --| |-- 52 bits --|
(+1)
1 . 0000 ---- 0000 * 2^52 * 2 => 1 0000 ---- 0000. * 2
|-- 52 bits --| |-- 52 bits --|
(By consuming the 2^52, radix
point has no way to go, but
there is still one 2 left in
exponent part)
=> 1 . 0000 ---- 0000 * 2^53
|-- 52 bits --|
Now we get the 9,007,199,254,740,992, and for the numbers greater than it, the format can only handle increments of 2 because every increment of 1 on the fraction part ends up being multiplied by the left 2 in the exponent part. That's why double-precision 64-bit binary format cannot hold odd numbers when the number is greater than 9,007,199,254,740,992:
(consume 2^52 to move radix point to the end)
1 . 0000 ---- 0001 * 2^53 => 1 0000 ---- 0001. * 2
|-- 52 bits --| |-- 52 bits --|
Following this pattern, when the number gets greater than 9,007,199,254,740,992 * 2 = 18,014,398,509,481,984 only 4 times the fraction can be held:
input: 18014398509481984 + 1 output: 18014398509481984 // expected: 18014398509481985
input: 18014398509481984 + 2 output: 18014398509481984 // expected: 18014398509481986
input: 18014398509481984 + 3 output: 18014398509481984 // expected: 18014398509481987
input: 18014398509481984 + 4 output: 18014398509481988 // expected: 18014398509481988
How about numbers between [ 2 251 799 813 685 248, 4 503 599 627 370 496 )?
1 . 0000 ---- 0001 * 2^51 => 1 0000 ---- 000.1
|-- 52 bits --| |-- 52 bits --|
The value 0.1 in binary is exactly 2^-1 (=1/2) (=0.5)
So when the number is less than 4,503,599,627,370,496 (2^52), there is one bit available to represent the 1/2 times of the integer:
input: 4503599627370495.5 output: 4503599627370495.5
input: 4503599627370495.75 output: 4503599627370495.5
Less than 2,251,799,813,685,248 (2^51)
input: 2251799813685246.75 output: 2251799813685246.8 // expected: 2251799813685246.75
input: 2251799813685246.25 output: 2251799813685246.2 // expected: 2251799813685246.25
input: 2251799813685246.5 output: 2251799813685246.5
/**
Please note that if you try this yourself and, say, log
these numbers to the console, they will get rounded. JavaScript
rounds if the number of digits exceed 17. The value
is internally held correctly:
*/
input: 2251799813685246.25.toString(2)
output: "111111111111111111111111111111111111111111111111110.01"
input: 2251799813685246.75.toString(2)
output: "111111111111111111111111111111111111111111111111110.11"
input: 2251799813685246.78.toString(2)
output: "111111111111111111111111111111111111111111111111110.11"
And what is the available range of exponent part? 11 bits allotted for it by the format.
From Wikipedia (for more details, go there)
So to make the exponent part be 2^52, we exactly need to set e = 1075.
To be safe
var MAX_INT = 4294967295;
Reasoning
I thought I'd be clever and find the value at which x + 1 === x with a more pragmatic approach.
My machine can only count 10 million per second or so... so I'll post back with the definitive answer in 28.56 years.
If you can't wait that long, I'm willing to bet that
Most of your loops don't run for 28.56 years
9007199254740992 === Math.pow(2, 53) + 1 is proof enough
You should stick to 4294967295 which is Math.pow(2,32) - 1 as to avoid expected issues with bit-shifting
Finding x + 1 === x:
(function () {
"use strict";
var x = 0
, start = new Date().valueOf()
;
while (x + 1 != x) {
if (!(x % 10000000)) {
console.log(x);
}
x += 1
}
console.log(x, new Date().valueOf() - start);
}());
The short answer is “it depends.”
If you’re using bitwise operators anywhere (or if you’re referring to the length of an Array), the ranges are:
Unsigned: 0…(-1>>>0)
Signed: (-(-1>>>1)-1)…(-1>>>1)
(It so happens that the bitwise operators and the maximum length of an array are restricted to 32-bit integers.)
If you’re not using bitwise operators or working with array lengths:
Signed: (-Math.pow(2,53))…(+Math.pow(2,53))
These limitations are imposed by the internal representation of the “Number” type, which generally corresponds to IEEE 754 double-precision floating-point representation. (Note that unlike typical signed integers, the magnitude of the negative limit is the same as the magnitude of the positive limit, due to characteristics of the internal representation, which actually includes a negative 0!)
ECMAScript 6:
Number.MAX_SAFE_INTEGER = Math.pow(2, 53)-1;
Number.MIN_SAFE_INTEGER = -Number.MAX_SAFE_INTEGER;
Other may have already given the generic answer, but I thought it would be a good idea to give a fast way of determining it :
for (var x = 2; x + 1 !== x; x *= 2);
console.log(x);
Which gives me 9007199254740992 within less than a millisecond in Chrome 30.
It will test powers of 2 to find which one, when 'added' 1, equals himself.
Anything you want to use for bitwise operations must be between 0x80000000 (-2147483648 or -2^31) and 0x7fffffff (2147483647 or 2^31 - 1).
The console will tell you that 0x80000000 equals +2147483648, but 0x80000000 & 0x80000000 equals -2147483648.
JavaScript has received a new data type in ECMAScript 2020: BigInt. It introduced numerical literals having an "n" suffix and allows for arbitrary precision:
var a = 123456789012345678901012345678901n;
Precision will still be lost, of course, when such big integer is (maybe unintentionally) coerced to a number data type.
And, obviously, there will always be precision limitations due to finite memory, and a cost in terms of time in order to allocate the necessary memory and to perform arithmetic on such large numbers.
For instance, the generation of a number with a hundred thousand decimal digits, will take a noticeable delay before completion:
console.log(BigInt("1".padEnd(100000,"0")) + 1n)
...but it works.
Try:
maxInt = -1 >>> 1
In Firefox 3.6 it's 2^31 - 1.
I did a simple test with a formula, X-(X+1)=-1, and the largest value of X I can get to work on Safari, Opera and Firefox (tested on OS X) is 9e15. Here is the code I used for testing:
javascript: alert(9e15-(9e15+1));
I write it like this:
var max_int = 0x20000000000000;
var min_int = -0x20000000000000;
(max_int + 1) === 0x20000000000000; //true
(max_int - 1) < 0x20000000000000; //true
Same for int32
var max_int32 = 0x80000000;
var min_int32 = -0x80000000;
Let's get to the sources
Description
The MAX_SAFE_INTEGER constant has a value of 9007199254740991 (9,007,199,254,740,991 or ~9 quadrillion). The reasoning behind that number is that JavaScript uses double-precision floating-point format numbers as specified in IEEE 754 and can only safely represent numbers between -(2^53 - 1) and 2^53 - 1.
Safe in this context refers to the ability to represent integers exactly and to correctly compare them. For example, Number.MAX_SAFE_INTEGER + 1 === Number.MAX_SAFE_INTEGER + 2 will evaluate to true, which is mathematically incorrect. See Number.isSafeInteger() for more information.
Because MAX_SAFE_INTEGER is a static property of Number, you always use it as Number.MAX_SAFE_INTEGER, rather than as a property of a Number object you created.
Browser compatibility
In JavaScript the representation of numbers is 2^53 - 1.
However, Bitwise operation are calculated on 32 bits ( 4 bytes ), meaning if you exceed 32bits shifts you will start loosing bits.
In the Google Chrome built-in javascript, you can go to approximately 2^1024 before the number is called infinity.
Scato wrotes:
anything you want to use for bitwise operations must be between
0x80000000 (-2147483648 or -2^31) and 0x7fffffff (2147483647 or 2^31 -
1).
the console will tell you that 0x80000000 equals +2147483648, but
0x80000000 & 0x80000000 equals -2147483648
Hex-Decimals are unsigned positive values, so 0x80000000 = 2147483648 - thats mathematically correct. If you want to make it a signed value you have to right shift: 0x80000000 >> 0 = -2147483648. You can write 1 << 31 instead, too.
Firefox 3 doesn't seem to have a problem with huge numbers.
1e+200 * 1e+100 will calculate fine to 1e+300.
Safari seem to have no problem with it as well. (For the record, this is on a Mac if anyone else decides to test this.)
Unless I lost my brain at this time of day, this is way bigger than a 64-bit integer.
Node.js and Google Chrome seem to both be using 1024 bit floating point values so:
Number.MAX_VALUE = 1.7976931348623157e+308

32-bit int struct bits don't seem to match up (nodejs)

I have a file that defines a set of tiles (used in an online game). The format for each tile is as follows:
x: 12 bits
y: 12 bits
tile: 8 bits
32 bits in total, so each tile can be expressed as a 32 bit integer.
More info about the file format can be found here:
http://wiki.minegoboom.com/index.php/LVL_Format
http://www.rarefied.org/subspace/lvlformat.html
The 4 byte structures are not broken along byte boundaries. As you can see x: and y: are both defined as 12 bits. ie. x is stored in 1.5 bytes, y is stored in 1.5 bytes and tile is stored in 1 byte.
Even though x and y use 12 bits their max value is 1023, so they could be expressed in 10 bits. This was down to the creator of the format. I guess they were just padding things out so they could use a 32-bit integer for each tile? Either way, for x and y we can ignore the final 2 bits.
I'm using a nodejs Buffer to read the file and I'm using the following code to read the values.
var n = tileBuffer.readUInt32LE(0);
var x = n & 0x03FF;
var y = (n >> 12) & 0x03FF;
var tile = (n >> 24) & 0x00ff;
This code works fine but when I read the bits themselves, in an attempt to understand binary better, I see something that confuses me.
Take, for example a int that expresses the following:
x: 1023
y: 1023
tile: 1
Creating the tiles in a map editor and reading the resulting file into a buffer returns <Buffer ff f3 3f 01>
When I convert each byte into a string of bits I get the following:
ff = 11111111
f3 = 11110011
3f = 00111111
01 = 00000001
11111111 11110011 00111111 00000001
I assume I should just take the first 12 bits as x but chop off the last 2 bits. Use the next 12 bits as y, chopping off 2 bits again, and the remaining 8 bits would be the tile.
x: 1111111111
y: 0011001111
tile: 00000001
The x is correct (1111111111 = 1023), the y is wrong (0011001111 = 207, not 1023), and tile is correct (00000001 = 1)
I'm confused and obviously missing something.
It makes more sense to look at it in this order: (this would be the binary representation of n)
00000001 00111111 11110011 11111111
On that order, you can easily do the masking and shifting visually.
The problem with what you did is that for example in 11111111 11110011, the bits of the second byte that belong to the first field are at the right (the lowest part of that byte), which in that order is discontinuous.
Also, masking with 0x03FF makes those first two fields have 10 bits, with two bits just disappearing. You can make them 12 bits by masking with 0x0FFF. As it is now, you effectively have two padding bits.

help with binary arithmetic subtraction

Assuming you are working with two 8-bit unsigned values like from a timer. If you record a stop time and a start time, and subtract start from stop to get the elapsed time, do you need to use mod to handle roll overs or does the subtraction just work out? For example say start time = 11111100 and the end time = 00000101 would (00000101 - 11111100) give you the correct result?
You can try it yourself, with your example :
start time = 1111 1100 (= 252)
end time = 0000 0101 (= 5)
(5-252) modulo 256 = 9.
end time - start time = 0000 0101 - 1111 1100 = 0000 1001 (= 9)
Of course, this wouldn't work if the difference between your start and end times was over 256. You can't know how many times the "end time" has gone past the "start time", just like classic overflows.
Yes, the subtraction works out as you would hope. You do not need to do anything special to handle roll over. For your example times the subtraction is well-behaved:
00000101 - 11111100 == 00001001
(5) - (252) == (9)
Or:
(5+256) - (252) == (9)
See this Python test to prove it:
>>> all((j - i) & 0xFF == ((j & 0xFF) - i) & 0xFF
... for i in range(256)
... for j in range(i, i + 256))
True
The j & 0xFF term will be smaller than i when j > 255. That does not affect the 8-bit results; this shows that those values still match the results for when j is not masked to 8 bits.

Converting to Base 10

Question
Let's say I have a string or array which represents a number in base N, N>1, where N is a power of 2. Assume the number being represented is larger than the system can handle as an actual number (an int or a double etc).
How can I convert that to a decimal string?
I'm open to a solution for any base N which satisfies the above criteria (binary, hex, ...). That is if you have a solution which works for at least one base N, I'm interested :)
Example:
Input: "10101010110101"
-
Output: "10933"
It depends on the particular language. Some have native support for arbitrary-length integers, and others can use libraries such as GMP. After that it's just a matter of doing the lookup in a table for the digit value, then multiplying as appropriate.
This is from a Python-based computer science course I took last semester that's designed to handle up to base-16.
import string
def baseNTodecimal():
# get the number as a string
number = raw_input("Please type a number: ")
# convert it to all uppercase to match hexDigits (below)
number = string.upper(number)
# get the base as an integer
base = input("Please give me the base: ")
# the number of values that we have to change to base10
digits = len(number)
base10 = 0
# first position of any baseN number is 1's
position = 1
# set up a string so that the position of
# each character matches the decimal
# value of that character
hexDigits = "0123456789ABCDEF"
# for each 'digit' in the string
for i in range(1, digits+1):
# find where it occurs in the string hexDigits
digit = string.find(hexDigits, number[-i])
# multiply the value by the base position
# and add it to the base10 total
base10 = base10 + (position * digit)
print number[-i], "is in the " + str(position) + "'s position"
# increase the position by the base (e.g., 8's position * 2 = 16's position)
position = position * base
print "And in base10 it is", base10
Basically, it takes input as a string and then goes through and adds up each "digit" multiplied by the base-10 position. Each digit is actually checked for its index-position in the string hexDigits which is used as the numerical value.
Assuming the number that it returns is actually larger than the programming language supports, you could build up an array of Ints that represent the entire number:
[214748364, 8]
would represent 2147483648 (a number that a Java int couldn't handle).
That's some php code I've just written:
function to_base10($input, $base)
{
$result = 0;
$length = strlen($input);
for ($x=$length-1; $x>=0; $x--)
$result += (int)$input[$x] * pow($base, ($length-1)-$x);
return $result;
}
It's dead simple: just a loop through every char of the input string
This works with any base <10 but it can be easily extended to support higher bases (A->11, B->12, etc)
edit: oh didn't see the python code :)
yeah, that's cooler
I would choose a language which more or less supports natively math representation like 'lisp'. I know it seems less and less people use it, but it still has its value.
I don't know if this is large enough for your usage, but the largest integer number I could represent in my common lisp environment (CLISP) was 2^(2^20)
>> (expt 2 (expt 2 20)
In lisp you can easily represent hex, dec, oct and bin as follows
>> \#b1010
10
>> \#o12
10
>> 10
10
>> \#x0A
10
You can write rationals in other bases from 2 to 36 with #nR
>> #36rABCDEFGHIJKLMNOPQRSTUVWXYZ
8337503854730415241050377135811259267835
For more information on numbers in lisp see: Practical Common Lisp Book

Convert a string into Morse code [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 8 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
The challenge
The shortest code by character count, that will input a string using only alphabetical characters (upper and lower case), numbers, commas, periods and question mark, and returns a representation of the string in Morse code.
The Morse code output should consist of a dash (-, ASCII 0x2D) for a long beep (AKA 'dah') and a dot (., ASCII 0x2E) for short beep (AKA 'dit').
Each letter should be separated by a space (' ', ASCII 0x20), and each word should be separated by a forward slash (/, ASCII 0x2F).
Morse code table:
alt text http://liranuna.com/junk/morse.gif
Test cases:
Input:
Hello world
Output:
.... . .-.. .-.. --- / .-- --- .-. .-.. -..
Input:
Hello, Stackoverflow.
Output:
.... . .-.. .-.. --- --..-- / ... - .- -.-. -.- --- ...- . .-. ..-. .-.. --- .-- .-.-.-
Code count includes input/output (that is, the full program).
C (131 characters)
Yes, 131!
main(c){for(;c=c?c:(c=toupper(getch())-32)?
"•ƒŒKa`^ZRBCEIQiw#S#nx(37+$6-2&#/4)'18=,*%.:0;?5"
[c-12]-34:-3;c/=2)putch(c/2?46-c%2:0);}
I eeked out a few more characters by combining the logic from the while and for loops into a single for loop, and by moving the declaration of the c variable into the main definition as an input parameter. This latter technique I borrowed from strager's answer to another challenge.
For those trying to verify the program with GCC or with ASCII-only editors, you may need the following, slightly longer version:
main(c){for(;c=c?c:(c=toupper(getchar())-32)?c<0?1:
"\x95#\x8CKa`^ZRBCEIQiw#S#nx(37+$6-2&#/4)'18=,*%.:0;?5"
[c-12]-34:-3;c/=2)putchar(c/2?46-c%2:32);}
This version is 17 characters longer (weighing in at a comparatively huge 148), due to the following changes:
+4: getchar() and putchar() instead of the non-portable getch() and putch()
+6: escape codes for two of the characters instead of non-ASCII characters
+1: 32 instead of 0 for space character
+6: added "c<0?1:" to suppress garbage from characters less than ASCII 32 (namely, from '\n'). You'll still get garbage from any of !"#$%&'()*+[\]^_`{|}~, or anything above ASCII 126.
This should make the code completely portable. Compile with:
gcc -std=c89 -funsigned-char morse.c
The -std=c89 is optional. The -funsigned-char is necessary, though, or you will get garbage for comma and full stop.
135 characters
c;main(){while(c=toupper(getch()))for(c=c-32?
"•ƒŒKa`^ZRBCEIQiw#S#nx(37+$6-2&#/4)'18=,*%.:0;?5"
[c-44]-34:-3;c;c/=2)putch(c/2?46-c%2:0);}
In my opinion, this latest version is much more visually appealing, too. And no, it's not portable, and it's no longer protected against out-of-bounds input. It also has a pretty bad UI, taking character-by-character input and converting it to Morse Code and having no exit condition (you have to hit Ctrl+Break). But portable, robust code with a nice UI wasn't a requirement.
A brief-as-possible explanation of the code follows:
main(c){
while(c = toupper(getch())) /* well, *sort of* an exit condition */
for(c =
c - 32 ? // effectively: "if not space character"
"•ƒŒKa`^ZRBCEIQiw#S#nx(37+$6-2&#/4)'18=,*%.:0;?5"[c - 44] - 34
/* This array contains a binary representation of the Morse Code
* for all characters between comma (ASCII 44) and capital Z.
* The values are offset by 34 to make them all representable
* without escape codes (as long as chars > 127 are allowed).
* See explanation after code for encoding format.
*/
: -3; /* if input char is space, c = -3
* this is chosen because -3 % 2 = -1 (and 46 - -1 = 47)
* and -3 / 2 / 2 = 0 (with integer truncation)
*/
c; /* continue loop while c != 0 */
c /= 2) /* shift down to the next bit */
putch(c / 2 ? /* this will be 0 if we're down to our guard bit */
46 - c % 2 /* We'll end up with 45 (-), 46 (.), or 47 (/).
* It's very convenient that the three characters
* we need for this exercise are all consecutive.
*/
: 0 /* we're at the guard bit, output blank space */
);
}
Each character in the long string in the code contains the encoded Morse Code for one text character. Each bit of the encoded character represents either a dash or a dot. A one represents a dash, and a zero represents a dot. The least significant bit represents the first dash or dot in the Morse Code. A final "guard" bit determines the length of the code. That is, the highest one bit in each encoded character represents end-of-code and is not printed. Without this guard bit, characters with trailing dots couldn't be printed correctly.
For instance, the letter 'L' is ".-.." in Morse Code. To represent this in binary, we need a 0, a 1, and two more 0s, starting with the least significant bit: 0010. Tack one more 1 on for a guard bit, and we have our encoded Morse Code: 10010, or decimal 18. Add the +34 offset to get 52, which is the ASCII value of the character '4'. So the encoded character array has a '4' as the 33rd character (index 32).
This technique is similar to that used to encode characters in ACoolie's, strager's(2), Miles's, pingw33n's, Alec's, and Andrea's solutions, but is slightly simpler, requiring only one operation per bit (shifting/dividing), rather than two (shifting/dividing and decrementing).
EDIT:
Reading through the rest of the implementations, I see that Alec and Anon came up with this encoding scheme—using the guard bit—before I did. Anon's solution is particularly interesting, using Python's bin function and stripping off the "0b" prefix and the guard bit with [3:], rather than looping, anding, and shifting, as Alec and I did.
As a bonus, this version also handles hyphen (-....-), slash (-..-.), colon (---...), semicolon (-.-.-.), equals (-...-), and at sign (.--.-.). As long as 8-bit characters are allowed, these characters require no extra code bytes to support. No more characters can be supported with this version without adding length to the code (unless there's Morse Codes for greater/less than signs).
Because I find the old implementations still interesting, and the text has some caveats applicable to this version, I've left the previous content of this post below.
Okay, presumably, the user interface can suck, right? So, borrowing from strager, I've replaced gets(), which provides buffered, echoed line input, with getch(), which provides unbuffered, unechoed character input. This means that every character you type gets translated immediately into Morse Code on the screen. Maybe that's cool. It no longer works with either stdin or a command-line argument, but it's pretty damn small.
I've kept the old code below, though, for reference. Here's the new.
New code, with bounds checking, 171 characters:
W(i){i?W(--i/2),putch(46-i%2):0;}c;main(){while(c=toupper(getch())-13)
c=c-19?c>77|c<31?0:W("œ*~*hXPLJIYaeg*****u*.AC5+;79-#6=0/8?F31,2:4BDE"
[c-31]-42):putch(47),putch(0);}
Enter breaks the loop and exits the program.
New code, without bounds checking, 159 characters:
W(i){i?W(--i/2),putch(46-i%2):0;}c;main(){while(c=toupper(getch())-13)
c=c-19?W("œ*~*hXPLJIYaeg*****u*.AC5+;79-#6=0/8?F31,2:4BDE"[c-31]-42):
putch(47),putch(0);}
Below follows the old 196/177 code, with some explanation:
W(i){i?W(--i/2),putch(46-i%2):0;}main(){char*p,c,s[99];gets(s);
for(p=s;*p;)c=*p++,c=toupper(c),c=c-32?c>90|c<44?0:W(
"œ*~*hXPLJIYaeg*****u*.AC5+;79-#6=0/8?F31,2:4BDE"[c-44]-42):
putch(47),putch(0);}
This is based on Andrea's Python answer, using the same technique for generating the morse code as in that answer. But instead of storing the encodable characters one after another and finding their indexes, I stored the indexes one after another and look them up by character (similarly to my earlier answer). This prevents the long gaps near the end that caused problems for earlier implementors.
As before, I've used a character that's greater than 127. Converting it to ASCII-only adds 3 characters. The first character of the long string must be replaced with \x9C. The offset is necessary this time, otherwise a large number of characters are under 32, and must be represented with escape codes.
Also as before, processing a command-line argument instead of stdin adds 2 characters, and using a real space character between codes adds 1 character.
On the other hand, some of the other routines here don't deal with input outside the accepted range of [ ,.0-9\?A-Za-z]. If such handling were removed from this routine, then 19 characters could be removed, bringing the total down as low as 177 characters. But if this is done, and invalid input is fed to this program, it may crash and burn.
The code in this case could be:
W(i){i?W(--i/2),putch(46-i%2):0;}main(){char*p,s[99];gets(s);
for(p=s;*p;p++)*p=*p-32?W(
"œ*~*hXPLJIYaeg*****u*.AC5+;79-#6=0/8?F31,2:4BDE"
[toupper(*p)-44]-42):putch(47),putch(0);}
Using a Morse Code Font?
Console.Write(params[0]);
Perl, 170 characters (with a little help from accomplished golfer mauke). Wrapped for clarity; all newlines are removable.
$_=uc<>;y,. ,|/,;s/./$& /g;#m{A..Z,0..9,qw(| , ?)}=
".-NINNN..]IN-NII..AMN-AI---.M-ANMAA.I.-].AIAA-NANMMIOMAOUMSMSAH.B.MSOIONARZMIZ"
=~/../g;1while s![]\w|,?]!$m{$&}!;print
Explanation:
Extract the morse dictionary. Each symbol is defined in terms of two chars, which can be either literal dots or dashes, or a reference to the value of another defined char. E and T contain dummy chars to avoid desyncing the decoder; we'll remove them later.
Read and format the input. "Hello world" becomes "H E L L O / W O R L D"
The next step depends on the input and output dictionaries being distinct, so turn dots in the input to an unused char (vertical bar, |)
Replace any char in the input that occurs in the morse dictionary with its value in the dictionary, until no replacements occur.
Remove the dummy char mentioned in step 1.
Print the output.
In the final version, the dictionary is optimized for runtime efficiency:
All one-symbol characters (E and T) and two-symbol characters (A, I, M, and N) are defined directly and decode in one pass.
All three-symbol characters are defined in terms of a two-symbol character and a literal symbol, decoding in two passes.
All four-symbol characters are defined in terms of two two-symbol characters, decoding in two passes with three replacements.
The five- and six-symbol characters (numbers and punctuation) decode in three passes, with four or five replacements respectively.
Since the golfed code only replaces one character per loop (to save one character of code!) the number of loops is limited to five times the length of the input (three times the length of the input if only alphabetics are used). But by adding a g to the s/// operation, the number of loops is limited to three (two if only alphabetics are used).
Example transformation:
Hello 123
H E L L O / 1 2 3
II .] AI AI M- / AO UM SM
.... . .-.. .-.. --- / .-M- .A-- I.--
.... . .-.. .-.. --- / .---- ..--- ...--
Python list comprehension, 159-character one-liner
for c in raw_input().upper():print c<","and"/"or bin(ord("•ƒwTaQIECBRZ^`šŒ#S#n|':<.$402&9/6)(18?,*%+3-;=>"[ord(c)-44])-34)[3:].translate(" "*47+"/.-"+" "*206),
Uses the similar data packing to P Daddy's C implementation, but does not store the bits in reverse order and uses bin() to extract the data rather than arithmetic. Note also that spaces are detected using inequality; it considers every character "less than comma" to be a space.
Python for loop, 205 chars including newlines
for a in raw_input().upper():
q='_ETIANMSURWDKGOHVF_L_PJBXCYZQ__54_3___2__+____16=/_____7___8_90'.find(a);s=''
while q>0:s='-.'[q%2]+s;q=~-q/2
print['/','--..--','..--..','.-.-.-',''][' ,?.'.find(a)]+s,
I was dorking around with a compact coding for the symbols, but I don't see if getting any better than the implicit trees already in use, so I present the coding here in case some one else can use it.
Consider the string:
--..--..-.-.-..--...----.....-----.--/
which contains all the needed sequences as substrings. We could code the symbols by offset and length like this:
ET RRRIIGGGJJJJ
--..--..-.-.-..--...----.....-----.--/
CCCC DD WWW 00000
,,,,,, AALLLL BBBB 11111
--..--..-.-.-..--...----.....-----.--/
?????? KKK MMSSS 22222
FFFF PPPP 33333
--..--..-.-.-..--...----.....-----.--/
UUU XXXX 44444
NN PPPP OOO 55555
--..--..-.-.-..--...----.....-----.--/
ZZZZ 66666
77777 YYYY
--..--..-.-.-..--...----.....-----.--/
...... 88888 HHHH
99999 VVVV QQQQ
--..--..-.-.-..--...----.....-----.--/
with the space (i.e. word boundary) starting and ending on the final character (the '/'). Feel free to use it, if you see a good way.
Most of the shorter symbols have several possible codings, of course.
P Daddy found a shorter version of this trick (and I can now see at least some of the redundancy here) and did a nice c implementation. Alec did a python implementation with the first (buggy and incomplete) version. Hobbs did a pretty compact perl version that I don't understand at all.
J, 124 130 134 characters
'.- /'{~;2,~&.>(]`(<&3:)#.(a:=])"0)}.&,&#:&.></.40-~a.i.')}ggWOKIHX`dfggggggg-#B4*:68,?5</.7>E20+193ACD'{~0>.45-~a.i.toupper
J beats C! Awesome!
Usage:
'.- /'{~;2,~&.>(]`(<&3:)#.(a:=])"0)}.&,&#:&.></.40-~a.i.')}ggWOKIHX`dfggggggg-#B4*:68,?5</.7>E20+193ACD'{~0>.45-~a.i.toupper 'Hello World'
.... . .-.. .-.. --- / .-- --- .-. .-.. -..
'.- /'{~;2,~&.>(]`(<&3:)#.(a:=])"0)}.&,&#:&.></.40-~a.i.')}ggWOKIHX`dfggggggg-#B4*:68,?5</.7>E20+193ACD'{~0>.45-~a.i.toupper 'Hello, Stackoverflow.'
.... . .-.. .-.. --- .-.-.- / ... - .- -.-. -.- --- ...- . .-. ..-. .-.. --- .-- --..--
Python 3 One Liner: 172 characters
print(' '.join('/'if c==' 'else''.join('.'if x=='0'else'-'for x in bin(ord("ijÁĕÁÿïçãáàðøüþÁÁÁÁÁČÁÅ×ÚÌÂÒÎÐÄ×ÍÔÇÆÏÖÝÊÈÃÉÑËÙÛÜ"[ord(c)-44])-192)[3:])for c in input().upper()))
(Encoding the tranlation table into unicode code points. Works fine, and they display here fine in my test on my Windows Vista machine.)
Edited to pare down to 184 characters by removing some unnecessary spaces and brackets (making list comps gen exps).
Edit again: More spaces removed that I didn't even know was possible before seeing other answers here - so down to 176.
Edit again down to 172 (woo woo!) by using ' '.join instead of ''.join and doing the spaces separately. (duh!)
C# 266 chars
The 131 char C solution translated to C# yields 266 characters:
foreach(var i in Encoding.ASCII.GetBytes(args[0].ToUpper())){var c=(int)i;for(c=(c-32!=0)?Encoding.ASCII.GetBytes("•ƒŒKa`^ZRBCEIQiw#S#nx(37+$6-2&#/4)'18=,*%.:0;?5")[c-44]-34:-3;c!=0;c/=2)Console.Write(Encoding.ASCII.GetChars(new byte[]{(byte)((c/2!=0)?46-c%2:0)}));}
which is more readable as:
foreach (var i in Encoding.ASCII.GetBytes(args[0].ToUpper()))
{
var c = (int)i;
for (c = ((c - 32) != 0) ? Encoding.ASCII.GetBytes("•ƒŒKa`^ZRBCEIQiw#S#nx(37+$6-2&#/4)'18=,*%.:0;?5")[c - 44] - 34 : -3
; c != 0
; c /= 2)
Console.Write(Encoding.ASCII.GetChars(new byte[] { (byte)((c / 2 != 0) ? 46 - c % 2 : 0) }));
}
Golfscript - 106 chars - NO FUNNY CHARS :)
newline at the end of the input is not supported, so use something like this
echo -n Hello, Stackoverflow| ../golfscript.rb morse.gs
' '/{{.32|"!etianmsurwdkgohvf!l!pjbxcyzq"?)"UsL?/'#! 08<>"#".,?0123456789"?=or
2base(;>{'.-'\=}%' '}%}%'/'*
Letters are a special case and converted to lowercase and ordered in their binary positions.
Everything else is done by a translation table
Python
Incomplete solution, but maybe somebody can make a full solution out of it. Doesn't handle digits or punctuation, but weighs in at only 154 chars.
def e(l):
i='_etianmsurwdkgohvf_l_pjbxcyzq'.find(l.lower());v=''
while i>0:v='-.'[i%2]+v;i=(i-1)/2;return v or '/'
def enc(s):return ' '.join(map(e,s))
C (248 characters)
Another tree-based solution.
#define O putchar
char z[99],*t=
" ETINAMSDRGUKWOHBL~FCPJVX~YZQ~~54~3~~~2~~+~~~~16=/~~.~~7,~~8~90";c,p,i=0;
main(){gets(z);while(c=z[i++]){c-46?c-44?c:O(45):O(c);c=c>96?c-32:c;p=-1;
while(t[++p]!=c);for(;p;p/=2){O(45+p--%2);}c-32?O(32):(O(47),O(c));}}
Could be errors in source tree because wikipedia seems to have it wrong or maybe I misunderstood something.
F#, 256 chars
let rec D i=if i=16 then" "else
let x=int"U*:+F8c]uWjGbJ0-0Dnmd0BiC5?\4o`h7f>9[1E=pr_".[i]-32
if x>43 then"-"+D(x-43)else"."+D x
let M(s:string)=s.ToUpper()|>Seq.fold(fun s c->s+match c with
|' '->"/ "|','->"--..-- "|'.'->".-.-.- "|_->D(int c-48))""
For example
M("Hello, Stack.") |> printfn "%s"
yields
.... . .-.. .-.. --- --..-- / ... - .- -.-. -.- .-.-.-
I think my technique may be unique so far. The idea is:
there is an ascii range of chars that covers most of what we want (0..Z)
there are only 43 chars in this range
thus we can encode one bit (dash or dot) plus a 'next character' in a range of 86 chars
the range ascii(32-117) is all 'printable' and can serve as this 86-char range
so the string literal encodes a table along those lines
There's a little more to it, but that's the gist. Comma, period, and space are not in the range 0..Z so they're handled specially by the 'match'. Some 'unused' characters in the range 0..Z (like ';') are used in the table as suffixes of other morse translations that aren't themselves morse 'letters'.
Here's my contribution as a console application in VB.Net
Module MorseCodeConverter
Dim M() As String = {".-", "-...", "-.-.", "-..", ".", "..-.", "--.", "....", "..", ".---", "-.-", ".-..", "--", "-.", "---", ".--.", "--.-", ".-.", "...", "-", "..-", "...-", ".--", "-..-", "-.--", "--..", "-----", ".----", "..---", "...--", "....-", ".....", "-....", "--...", "---..", "----."}
Sub Main()
Dim I, O
Dim a, b
While True
I = Console.ReadLine()
O = ""
For Each a In I
b = AscW(UCase(a))
If b > 64 And b < 91 Then
O &= M(b - 65) & " "
ElseIf b > 47 And b < 58 Then
O &= M(b - 22) & " "
ElseIf b = 46 Then
O &= ".-.-.- "
ElseIf b = 44 Then
O &= "--..-- "
ElseIf b = 63 Then
O &= "..--.. "
Else
O &= "/"
End If
Next
Console.WriteLine(O)
End While
End Sub
End Module
I left he white space in to make it readable. Totals 1100 characters. It will read the input from the command line, one line at a time, and send the corresponding output back to the output stream. The compressed version is below, with only 632 characters.
Module Q
Dim M() As String={".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--..","-----",".----","..---","...--","....-",".....","-....","--...","---..","----."}
Sub Main()
Dim I,O,a,b:While 1:I=Console.ReadLine():O="":For Each a In I:b=AscW(UCase(a)):If b>64 And b<91 Then:O &=M(b-65)&" ":ElseIf b>47 And b<58 Then:O &=M(b-22)&" ":ElseIf b=46 Then:O &=".-.-.- ":ElseIf b=44 Then:O &="--..-- ":ElseIf b=63 Then:O &= "..--.. ":Else:O &="/":End IF:Next:Console.WriteLine(O):End While
End Sub
End Module
C (233 characters)
W(n,p){while(n--)putch(".-.-.--.--..--..-.....-----..../"[p++]);}main(){
char*p,c,s[99];gets(s);for(p=s;*p;){c=*p++;c=toupper(c);c=c>90?35:c-32?
"È#À#¶µ´³²±°¹¸·#####Ê##i Že‘J•aEAv„…`q!j“d‰ƒˆ"[c-44]:63;c-35?
W(c>>5,c&31):0;putch(0);}}
This takes input from stdin. Taking input from the command line adds 2 characters. Instead of:
...main(){char*p,c,s[99];gets(s);for(p=s;...
you get:
...main(int i,char**s){char*p,c;for(p=s[1];...
I'm using Windows-1252 code page for characters above 127, and I'm not sure how they'll turn up in other people's browsers. I notice that, in my browser at least (Google Chrome), two of the characters (between "#" and "i") aren't showing up. If you copy out of the browser and paste into a text editor, though, they do show up, albeit as little boxes.
It can be converted to ASCII-only, but this adds 24 characters, increasing the character count to 257. To do this, I first offset each character in the string by -64, minimizing the number of characters that are greater than 127. Then I substitute \xXX character escapes where necessary. It changes this:
...c>90?35:c-32?"È#À#¶µ´³²±°¹¸·#####Ê##i Že‘J•aEAv„…`q!j“d‰ƒˆ"[c-44]:63;
c-35?W(...
to this:
...c>90?99:c-32?"\x88#\x80#vutsrqpyxw#####\x8A#\0PA)\xE0N%Q\nU!O\5\1\66DE 1
\xE1*S$ICH"[c-44]+64:63;c-99?W(...
Here's a more nicely formatted and commented version of the code:
/* writes `n` characters from internal string to stdout, starting with
* index `p` */
W(n,p){
while(n--)
/* warning for using putch without declaring it */
putch(".-.-.--.--..--..-.....-----..../"[p++]);
/* dmckee noticed (http://tinyurl.com/n4eart) the overlap of the
* various morse codes and created a 37-character-length string that
* contained the morse code for every required character (except for
* space). You just have to know the start index and length of each
* one. With the same idea, I came up with this 32-character-length
* string. This not only saves 5 characters here, but means that I
* can encode the start indexes with only 5 bits below.
*
* The start and length of each character are as follows:
*
* A: 0,2 K: 1,3 U: 10,3 4: 18,5
* B: 16,4 L: 15,4 V: 19,4 5: 17,5
* C: 1,4 M: 5,2 W: 4,3 6: 16,5
* D: 9,3 N: 1,2 X: 9,4 7: 25,5
* E: 0,1 O: 22,3 Y: 3,4 8: 24,5
* F: 14,4 P: 4,4 Z: 8,4 9: 23,5
* G: 5,3 Q: 5,4 0: 22,5 .: 0,6
* H: 17,4 R: 0,3 1: 21,5 ,: 8,6
* I: 20,2 S: 17,3 2: 20,5 ?: 10,6
* J: 21,4 T: 1,1 3: 19,5
*/
}
main(){ /* yuck, but it compiles and runs */
char *p, c, s[99];
/* p is a pointer within the input string */
/* c saves from having to do `*p` all the time */
/* s is the buffer for the input string */
gets(s); /* warning for use without declaring */
for(p=s; *p;){ /* begin with start of input, go till null character */
c = *p++; /* grab *p into c, increment p.
* incrementing p here instead of in the for loop saves
* one character */
c=toupper(c); /* warning for use without declaring */
c = c > 90 ? 35 : c - 32 ?
"È#À#¶µ´³²±°¹¸·#####Ê##i Že‘J•aEAv„…`q!j“d‰ƒˆ"[c - 44] : 63;
/**** OR, for the ASCII version ****/
c = c > 90 ? 99 : c - 32 ?
"\x88#\x80#vutsrqpyxw#####\x8A#\0PA)\xE0N%Q\nU!O\5\1\66DE 1\xE1"
"*S$ICH"[c - 44] + 64 : 63;
/* Here's where it gets hairy.
*
* What I've done is encode the (start,length) values listed in the
* comment in the W function into one byte per character. The start
* index is encoded in the low 5 bits, and the length is encoded in
* the high 3 bits, so encoded_char = (char)(length << 5 | position).
* For the longer, ASCII-only version, 64 is subtracted from the
* encoded byte to reduce the necessity of costly \xXX representations.
*
* The character array includes encoded bytes covering the entire range
* of characters covered by the challenge, except for the space
* character, which is checked for separately. The covered range
* starts with comma, and ends with capital Z (the call to `toupper`
* above handles lowercase letters). Any characters not supported are
* represented by the "#" character, which is otherwise unused and is
* explicitly checked for later. Additionally, an explicit check is
* done here for any character above 'Z', which is changed to the
* equivalent of a "#" character.
*
* The encoded byte is retrieved from this array using the value of
* the current character minus 44 (since the first supported character
* is ASCII 44 and index 0 in the array). Finally, for the ASCII-only
* version, the offset of 64 is added back in.
*/
c - 35 ? W(c >> 5, c & 31) : 0;
/**** OR, for the ASCII version ****/
c - 99 ? W(c >> 5, c & 31) : 0;
/* Here's that explicit check for the "#" character, which, as
* mentioned above, is for characters which will be ignored, because
* they aren't supported. If c is 35 (or 99 for the ASCII version),
* then the expression before the ? evaluates to 0, or false, so the
* expression after the : is evaluated. Otherwise, the expression
* before the ? is non-zero, thus true, so the expression before
* the : is evaluated.
*
* This is equivalent to:
*
* if(c != 35) // or 99, for the ASCII version
* W(c >> 5, c & 31);
*
* but is shorter by 2 characters.
*/
putch(0);
/* This will output to the screen a blank space. Technically, it's not
* the same as a space character, but it looks like one, so I think I
* can get away with it. If a real space character is desired, this
* must be changed to `putch(32);`, which adds one character to the
* overall length.
} /* end for loop, continue with the rest of the input string */
} /* end main */
This beats everything here except for a couple of the Python implementations. I keep thinking that it can't get any shorter, but then I find some way to shave off a few more characters. If anybody can find any more room for improvement, let me know.
EDIT:
I noticed that, although this routine rejects any invalid characters above ASCII 44 (outputting just a blank space for each one), it doesn't check for invalid characters below this value. To check for these adds 5 characters to the overall length, changing this:
...c>90?35:c-32?"...
to this:
...c-32?c>90|c<44?35:"...
REBOL (118 characters)
A roughly 10 year-old implementation
foreach c ask""[l: index? find" etinamsdrgukwohblzfcpövxäqüyj"c while[l >= 2][prin pick"-."odd? l l: l / 2]prin" "]
Quoted from: http://www.rebol.com/oneliners.html
(no digits though and words are just separated by double spaces :/ ...)
Python (210 characters)
This is a complete solution based on Alec's one
def e(l):
i=(' etianmsurwdkgohvf_l_pjbxcyzq__54_3___2%7s16%7s7___8_90%12s?%8s.%29s,'%tuple('_'*5)).find(l.lower());v=''
while i>0:v='-.'[i%2]+v;i=(i-1)/2
return v or '/'
def enc(s):return ' '.join(map(e,s))
C, 338 chars
338 with indentation and all removable linebreaks removed:
#define O putchar
#define W while
char*l="x#####ppmmmmm##FBdYcbcbSd[Kcd`\31(\b1g_<qCN:_'|\25D$W[QH0";
int c,b,o;
main(){
W(1){
W(c<32)
c=getchar()&127;
W(c>96)
c^=32;
c-=32;
o=l[c/2]-64;
b=203+(c&1?o>>3:0);
o=c&1?o&7:o>>3;
W(o>6)
O(47),o=0;
c/=2;
W(c--)
b+=(l[c]-64&7)+(l[c]-64>>3);
b=(((l[b/7]<<7)+l[b/7+1])<<(b%7))>>14-o;
W(o--)
O(b&(1<<o)?46:45);
O(32);
}
}
This isn't based on the tree approach other people have been taking. Instead, l first encodes the lengths of all bytes between 32 and 95 inclusive, two bytes to a character. As an example, D is -.. for a length of 3 and E is . for a length of 1. This is encoded as 011 and 001, giving 011001. To make more characters encodable and avoid escapes, 64 is then added to the total, giving 1011001 - 89, ASCII Y. Non-morse characters are assigned a length of 0. The second half of l (starting with \031) are the bits of the morse code itself, with a dot being 1 and a dash 0. To avoid going into high ASCII, this data is encoded 7 bits/byte.
The code first sanitises c, then works out the morse length of c (in o), then adds up the lengths of all the previous characters to produce b, the bit index into the data.
Finally, it loops through the bits, printing dots and dashes.
The length '7' is used as a special flag for printing a / when encountering a space.
There are probably some small gains to be had from removing brackets, but I'm way off from some of the better results and I'm hungry, so...
C# Using Linq (133 chars)
static void Main()
{
Console.WriteLine(String.Join(" ", (from c in Console.ReadLine().ToUpper().ToCharArray()
select m[c]).ToArray()));
}
OK, so I cheated. You also need to define a dictionary as follows (didn't bother counting the chars, since this blows me out of the game):
static Dictionary<char, string> m = new Dictionary<char, string>() {
{'A', ".-"},
{'B', "-.."},
{'C', "-.-."},
{'D', "-.."},
{'E', "."},
{'F', "..-."},
{'G', "--."},
{'H', "...."},
{'I', ".."},
{'J', ".---"},
{'K', "-.-"},
{'L', ".-.."},
{'M', "--"},
{'N', "-."},
{'O', "---"},
{'P', ".--."},
{'Q', "--.-"},
{'R', ".-."},
{'S', "..."},
{'T', "-"},
{'U', "..-"},
{'V', "...-"},
{'W', ".--"},
{'X', "-..-"},
{'Y', "-.--"},
{'Z', "--.."},
{'0', "-----"},
{'1', ".----"},
{'2', "..---"},
{'3', "...--"},
{'4', "....-"},
{'5', "....."},
{'6', "-...."},
{'7', "--..."},
{'8', "---.."},
{'9', "----."},
{' ', "/"},
{'.', ".-.-.-"},
{',', "--..--"},
{'?', "..--.."},
};
Still, can someone provide a more concise C# implementation which is also as easy to understand and maintain as this?
Perl, 206 characters, using dmckee's idea
This is longer than the first one I submitted, but I still think it's interesting. And/or awful. I'm not sure yet. This makes use of dmckee's coding idea, plus a couple other good ideas that I saw around. Initially I thought that the "length/offset in a fixed string" thing couldn't come out to less data than the scheme in my other solution, which uses a fixed two bytes per char (and all printable bytes, at that). I did in fact manage to get the data down to considerably less (one byte per char, plus four bytes to store the 26-bit pattern we're indexing into) but the code to get it out again is longer, despite my best efforts to golf it. (Less complex, IMO, but longer anyway).
Anyway, 206 characters; newlines are removable except the first.
#!perl -lp
($a,#b)=unpack"b32C*",
"\264\202\317\0\31SF1\2I.T\33N/G\27\308XE0=\x002V7HMRfermlkjihgx\207\205";
$a=~y/01/-./;#m{A..Z,0..9,qw(. , ?)}=map{substr$a,$_%23,1+$_/23}#b;
$_=join' ',map$m{uc$_}||"/",/./g
Explanation:
There are two parts to the data. The first four bytes ("\264\202\317\0") represent 32 bits of morse code ("--.-..-.-.-----.....--..--------") although only the first 26 bits are used. This is the "reference string".
The remainder of the data string stores the starting position and length of substrings of the reference string that represent each character -- one byte per character, in the order (A, B, ... Z, 0, 1, ... 9, ".", ",", "?"). The values are coded as 23 * (length - 1) + pos, and the decoder reverses that. The last starting pos is of course 22.
So the unpack does half the work of extracting the data and the third line (as viewed here) does the rest, now we have a hash with $m{'a'} = '.-' et cetera, so all there is left is to match characters of the input, look them up in the hash, and format the output, which the last line does... with some help from the shebang, which tells perl to remove the newline on input, put lines of input in $_, and when the code completes running, write $_ back to output with newlines added again.
Python 2; 171 characters
Basically the same as Andrea's solution, but as a complete program, and using stupid tricks to make it shorter.
for c in raw_input().lower():print"".join(".-"[int(d)]for d in bin(
(' etianmsurwdkgohvf_l_pjbxcyzq__54_3___2%7s16%7s7___8_90%12s?%8s.%29s,'
%(('',)*5)).find(c))[3:])or'/',
(the added newlines can all be removed)
Or, if you prefer not to use the bin() function in 2.6, we can get do it in 176:
for c in raw_input():C=lambda q:q>0and C(~-q/2)+'-.'[q%2]or'';print C(
(' etianmsurwdkgohvf_l_pjbxcyzq__54_3___2%7s16%7s7___8_90%12s?%8s.%29s,'%
(('',)*5)).find(c.lower()))or'/',
(again, the added newlines can all be removed)
C89 (293 characters)
Based off some of the other answers.
EDIT: Shrunk the tree (yay).
#define P putchar
char t['~']="~ETIANMSURWDKGOHVF~L~PJBXCYZQ~~54~3",o,q[9],Q=10;main(c){for(;Q;)t[
"&./7;=>KTr"[--Q]]="2167890?.,"[Q];while((c=getchar())>=0){c-=c<'{'&c>96?32:0;c-
10?c-32?0:P(47):P(10);for(o=1;o<'~';++o)if(t[o]==c){for(;o;o/=2)q[Q++]=45+(o--&1
);for(;Q;P(q[--Q]));break;}P(32);}}
Here's another approach, based on dmckee's work, demonstrating just how readable Python is:
Python
244 characters
def h(l):p=2*ord(l.upper())-88;a,n=map(ord,"AF__GF__]E\\E[EZEYEXEWEVEUETE__________CF__IBPDJDPBGAHDPC[DNBSDJCKDOBJBTCND`DKCQCHAHCZDSCLD??OD"[p:p+2]);return "--..--..-.-.-..--...----.....-----.-"[a-64:a+n-128]
def e(s):return ' '.join(map(h,s))
Limitations:
dmckee's string missed the 'Y' character, and I was too lazy to add it. I think you'd just have to change the "??" part, and add a "-" at the end of the second string literal
it doesn't put '/' between words; again, lazy
Since the rules called for fewest characters, not fewest bytes, you could make at least one of my lookup tables smaller (by half) if you were willing to go outside the printable ASCII characters.
EDIT: If I use naïvely-chosen Unicode chars but just keep them in escaped ASCII in the source file, it still gets a tad shorter because the decoder is simpler:
Python
240 characters
def h(l):a,n=divmod(ord(u'\x06_7_\xd0\xc9\xc2\xbb\xb4\xad\xa6\x9f\x98\x91_____\x14_AtJr2<s\xc1d\x89IQdH\x8ff\xe4Pz9;\xba\x88X_f'[ord(l.upper())-44]),7);return "--..--..-.-.-..--...----.....-----.-"[a:a+n]
def e(s):return ' '.join(map(h,s))
I think it also makes the intent of the program much clearer.
If you saved this as UTF-8, I believe the program would be down to 185 characters, making it the shortest complete Python solution, and second only to Perl. :-)
Here's a third, completely different way of encoding morse code:
Python
232 characters
def d(c):
o='';b=ord("Y_j_?><80 !#'/_____f_\x06\x11\x15\x05\x02\x15\t\x1c\x06\x1e\r\x12\x07\x05\x0f\x16\x1b\n\x08\x03\r\x18\x0e\x19\x01\x13"[ord(c.upper())-44])
while b!=1:o+='.-'[b&1];b/=2
return o
e=lambda s:' '.join(map(d,s))
If you can figure out a way to map this onto some set of printable characters, you could save quite a few characters. This is probably my most direct solution, though I don't know if it's the most readable.
OK, now I've wasted way too much time on this.
Haskell
type MorseCode = String
program :: String
program = "__5__4H___3VS__F___2 UI__L__+_ R__P___1JWAE"
++ "__6__=B__/_XD__C__YKN__7_Z__QG__8_ __9__0 OMT "
decode :: MorseCode -> String
decode = interpret program
where
interpret = head . foldl exec []
exec xs '_' = undefined : xs
exec (x:y:xs) c = branch : xs
where
branch (' ':ds) = c : decode ds
branch ('-':ds) = x ds
branch ('.':ds) = y ds
branch [] = [c]
For example, decode "-- --- .-. ... . -.-. --- -.. ." returns "MORSE CODE".
This program is from taken from the excellent article Fun with Morse Code.
PHP
I modified the previous PHP entry to be slightly more efficient. :)
$a=array(32=>"/",44=>"--..--",1,".-.-.-",48=>"-----",".----","..---","...--","....-",".....","-....","--...","---..","----.",63=>"..--..",1,".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--..");
foreach(str_split(strtoupper("hello world?"))as$k=>$v){echo $a[ord($v)]." ";}
Komodo says 380 characters on 2 lines - the extra line is just for readability. ;D
The interspersed 1s in the array is just to save 2 bytes by filling that array position with data instead of manually jumping to the array position after that.
Consider the first vs. the second. The difference is clearly visible. :)
array(20=>"data",22=>"more data";
array(20=>"data",1,"more data";
The end result, however, is exactly as long as you use the array positions rather than loop through the contents, which we don't do on this golf course.
End result: 578 characters, down to 380 (198 characters, or ~34.26% savings).
Bash, a script I wrote a while ago (time-stamp says last year) weighing in at a hefty 1661 characters. Just for fun really :)
#!/bin/sh
txt=''
res=''
if [ "$1" == '' ]; then
read -se txt
else
txt="$1"
fi;
len=$(echo "$txt" | wc -c)
k=1
while [ "$k" -lt "$len" ]; do
case "$(expr substr "$txt" $k 1 | tr '[:upper:]' '[:lower:]')" in
'e') res="$res"'.' ;;
't') res="$res"'-' ;;
'i') res="$res"'..' ;;
'a') res="$res"'.-' ;;
'n') res="$res"'-.' ;;
'm') res="$res"'--' ;;
's') res="$res"'...' ;;
'u') res="$res"'..-' ;;
'r') res="$res"'.-.' ;;
'w') res="$res"'.--' ;;
'd') res="$res"'-..' ;;
'k') res="$res"'-.-' ;;
'g') res="$res"'--.' ;;
'o') res="$res"'---' ;;
'h') res="$res"'....' ;;
'v') res="$res"'...-' ;;
'f') res="$res"'..-.' ;;
'l') res="$res"'.-..' ;;
'p') res="$res"'.--.' ;;
'j') res="$res"'.---' ;;
'b') res="$res"'-...' ;;
'x') res="$res"'-..-' ;;
'c') res="$res"'-.-.' ;;
'y') res="$res"'-.--' ;;
'z') res="$res"'--..' ;;
'q') res="$res"'--.-' ;;
'5') res="$res"'.....' ;;
'4') res="$res"'....-' ;;
'3') res="$res"'...--' ;;
'2') res="$res"'..---' ;;
'1') res="$res"'.----' ;;
'6') res="$res"'-....' ;;
'7') res="$res"'--...' ;;
'8') res="$res"'---..' ;;
'9') res="$res"'----.' ;;
'0') res="$res"'-----' ;;
esac;
[ ! "$(expr substr "$txt" $k 1)" == " " ] && [ ! "$(expr substr "$txt" $(($k+1)) 1)" == ' ' ] && res="$res"' '
k=$(($k+1))
done;
echo "$res"
C89 (388 characters)
This is incomplete as it doesn't handle comma, fullstop, and query yet.
#define P putchar
char q[10],Q,tree[]=
"EISH54V 3UF 2ARL + WP J 1TNDB6=X/ KC Y MGZ7 Q O 8 90";s2;e(x){q[Q++]
=x;}p(){for(;Q--;putchar(q[Q]));Q=0;}T(int x,char*t,int s){s2=s/2;return s?*t-x
?t[s2]-x?T(x,++t+s2,--s/2)?e(45):T(x,t,--s/2)?e(46):0:e(45):e(46):0;}main(c){
while((c=getchar())>=0){c-=c<123&&c>96?32:0;if(c==10)P(10);if(c==32)P(47);else
T(c,tree,sizeof(tree)),p();P(' ');}}
Wrapped for readability. Only two of the linebreaks are required (one for the #define, one after else, which could be a space). I've added a few non-standard characters but didn't add non-7-bit ones.
C, 533 characters
I took advice from some comments and switched to stdin. Killed another 70 characters roughly.
#include <stdio.h>
#include <ctype.h>
char *u[36] = {".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--..","-----",".----","..---","...--","....-",".....","-....","--...","---..","----."};
main(){
char*v;int x;char o;
do{
o = toupper(getc(stdin));v=0;if(o>=65&&o<=90)v=u[o-'A'];if(o>=48&&o<=57)v=u[o-'0'+26];if(o==46)v=".-.-.-";if(o==44)v="--..--";if(o==63)v="..--..";if(o==32)v="/";if(v)printf("%s ", v);} while (o != EOF);
}
C (381 characters)
char*p[36]={".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--..","-----",".----","..---","...--","....-",".....","-....","--...","---..","----."};
main(){int c;while((c=tolower(getchar()))!=10)printf("%s ",c==46?".-.-.-":c==44?"--..--":c==63?"..--..":c==32?"/":*(p+(c-97)));}
C, 448 bytes using cmdline arguments:
char*a[]={".-.-.-","--..--","..--..","/",".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--..","-----",".----","..---","...--","....-",".....","-....","--...","---..","----."},*k=".,? ",*s,*p,x;main(int _,char**v){for(;s=*++v;putchar(10))for(;x=*s++;){p=strchr(k,x);printf("%s ",p?a[p-k]:isdigit(x)?a[x-18]:isalpha(x=toupper(x))?a[x-61]:0);}}
C, 416 bytes using stdin:
char*a[]={".-.-.-","--..--","..--..","/",".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--..","-----",".----","..---","...--","....-",".....","-....","--...","---..","----."},*k=".,? ",*p,x;main(){while((x=toupper(getchar()))-10){p=strchr(k,x);printf("%s ",p?a[p-k]:isdigit(x)?a[x-18]:isalpha(x)?a[x-61]:0);}}