how do i show bit length of a byte, NOT Integer - binary

#Python 3+
i read bin byte
ReadSize =1
with open(ArgsFile, "rb") as f:
byte = f.read(ReadSize)
print(byte)
example output
'h'
to show Pos integer i would write
print(len(bin(1)[2:]))
or neg int
len(bin(-1)[3:])
how do I print length of bits needed for my byte read from file?

You can try something like this:
Test file:
$ hexdump -C ~/test.bin
00000000 b1 23 50 c2 06 |.#P..|
00000005
test.py:
import sys
def main():
with open("test.bin", "rb") as f:
while b := f.read(1):
bits = int.from_bytes(b, sys.byteorder).bit_length()
print(b.hex(), bits)
if __name__ == "__main__":
main()
Test:
$ python test.py
b1 8
23 6
50 7
c2 8
06 3

Related

How can I merge/join multiple columns from two dataframes, depending on a matching pattern

I would like to merge two dataframes based on similar patterns in the chromosome column. I made various attempts with R & BASH such as with "data.table" "tidyverse", & merge(). Could someone help me by providing alternative solutions in R, BASH, Python, Perl, etc. for solving this solution? I would like to merge based on the chromosome information and retain both counts/RXNs.
NOTE: These two DFs are not aligned and I am also curious what happens if some values are missing.
Thanks and Cheers:
DF1:
Chromosome;RXN;ID
1009250;q9hxn4;NA
1010820;p16256;NA
31783;p16588;"PNTOt4;PNTOt4pp"
203;3-DEHYDROQUINATE-DEHYDRATASE-RXN;"DHQTi;DQDH"
DF2:
Chromosome;Count1;Count2;Count3;Count4;Count5
203;1;31;1;0;0;0
1010820;152;7;0;11;4
1009250;5;0;0;17;0
31783;1;0;0;0;0;0
Expected Result:
Chromosome;RXN;Count1;Count2;Count3;Count4;Count5
1009250;q9hxn4;5;0;0;17;0
1010820;p16256;152;7;0;11;4
31783;p16588;1;0;0;0;0
203;3-DEHYDROQUINATE-DEHYDRATASE-RXN;1;31;1;0;0;0
As bash was mentioned in the text body, I offer you an awk solution. The dataframes are in files df1 and df2:
$ awk '
BEGIN {
FS=OFS=";" # input and output field delimiters
}
NR==FNR { # process df1
a[$1]=$2 # hash to an array, 1st is the key, 2nd the value
next # process next record
}
{ # process df2
$2=(a[$1] OFS $2) # prepend RXN field to 2nd field of df2
}1' df1 df2 # 1 is output command, mind the file order
The 2 last lines could be written perhaps more clearly:
...
{
print $1,a[$1],$2,$3,$4,$5,$6
}' df1 df2
Output:
Chromosome;RXN;Count1;Count2;Count3;Count4;Count5
203;3-DEHYDROQUINATE-DEHYDRATASE-RXN;1;31;1;0;0;0
1010820;p16256;152;7;0;11;4
1009250;q9hxn4;5;0;0;17;0
31783;p16588;1;0;0;0;0;0
Output will be in the order of df2. Chromosome present in df1 but not in df2 will not be included. Chromosome in df2 but not in df1 will be output from df2 with empty RXN field. Also, if there are duplicate chromosomes in df1, the last one is used. This can be fixed if it is an issue.
If I understand your request correctly, this should do it in Python. I've made the Chromosome column into the index of each DataFrame.
from io import StringIO
txt1 = '''Chromosome;RXN;ID
1009250;q9hxn4;NA
1010820;p16256;NA
31783;p16588;"PNTOt4;PNTOt4pp"
203;3-DEHYDROQUINATE-DEHYDRATASE-RXN;"DHQTi;DQDH"'''
txt2 = """Chromosome;Count1;Count2;Count3;Count4;Count5;Count6
203;1;31;1;0;0;0
1010820;152;7;0;11;4
1009250;5;0;0;17;0
31783;1;0;0;0;0;0"""
df1 = pd.read_csv(
StringIO(txt1),
sep=';',
index_col=0,
header=0
)
df2 = pd.read_csv(
StringIO(txt2),
sep=';',
index_col=0,
header=0
)
DF1:
RXN ID
Chromosome
1009250 q9hxn4 NaN
1010820 p16256 NaN
31783 p16588 PNTOt4;PNTOt4pp
203 3-DEHYDROQUINATE-DEHYDRATASE-RXN DHQTi;DQDH
DF2:
Count1 Count2 Count3 Count4 Count5 Count6
Chromosome
203 1 31 1 0 0 0.0
1010820 152 7 0 11 4 NaN
1009250 5 0 0 17 0 NaN
31783 1 0 0 0 0 0.0
result = pd.concat(
[df1.sort_index(), df2.sort_index()],
axis=1
)
print(result)
RXN ID Count1 Count2 Count3 Count4 Count5 Count6
Chromosome
203 3-DEHYDROQUINATE-DEHYDRATASE-RXN DHQTi;DQDH 1 31 1 0 0 0.0
31783 p16588 PNTOt4;PNTOt4pp 1 0 0 0 0 0.0
1009250 q9hxn4 NaN 5 0 0 17 0 NaN
1010820 p16256 NaN 152 7 0 11 4 NaN
The concat command also handles mismatched indices by simply filling in NaN values for columns in e.g. df1 if df2 doesn't have have the same index, and vice versa.

Encoding troubles with python, mysql and utf8mb4

I get the following warnings, when trying to save a simple dataframe to mysql.:
C:...\anaconda3\lib\site-packages\pymysql\cursors.py:170: Warning: (1366, "Incorrect string value: '\x92\xE9t\xE9)' for column 'VARIABLE_VALUE' at row 518")
result = self._query(query)
And
C:...anaconda3\lib\site-packages\pymysql\cursors.py:170: Warning:
(3719, "'utf8' is currently an alias for the character set UTF8MB3,
but will be an alias for UTF8MB4 in a future release. Please consider
using UTF8MB4 in order to be unambiguous.") result =
self._query(query)
Environment info : I use Mysql8, python3.6 (pymysql 0.9.2, sqlalchemy 1.2.1)
I visited posts like the one linked bellow, none of which seem to give a solution as to how to avoid this warning.
MySQL “incorrect string value” error when save unicode string in Django -> Indication is to use UTF8
N.B : The Collation in the table within mysql doesn't seem to be set to the one I specified in the create_db function within the Connection class.
The executable code:
import DataEngine.db.Connection as connection
import random
import pandas as pd
if __name__ == "__main__":
conn = connection.Connection(host="host_name", port="3306", user="username", password="password")
conn.create_db("raw_data")
conn.establish("raw_data")
l1 = []
for i in range(10):
l_nested = []
for j in range(10):
l_nested.append(random.randint(0, 100))
l1.append(l_nested)
df = pd.DataFrame(l1)
conn.save(df, "random_df")
df2 = conn.retrieve("random_df")
print(df2)
So the dataframe that is persisted in the database is :
index 0 1 2 3 4 5 6 7 8 9
0 0 11 57 75 45 81 70 91 66 93 96
1 1 51 43 3 64 2 6 93 5 49 40
2 2 35 80 76 11 23 87 19 32 13 98
3 3 82 10 69 40 34 66 42 24 82 59
4 4 49 74 39 61 14 63 94 92 82 85
5 5 50 47 90 75 48 77 17 43 5 29
6 6 70 40 78 60 29 48 52 48 39 36
7 7 21 87 41 53 95 3 31 67 50 30
8 8 72 79 73 82 20 15 51 14 38 42
9 9 68 71 11 17 48 68 17 42 83 95
My Connection class
import sqlalchemy
import pymysql
import pandas as pd
class Connection:
def __init__(self: object, host: str, port: str, user: str, password: str):
self.host = host
self.port = port
self.user = user
self.password = password
self.conn = None
def create_db(self: object, db_name: str, charset: str = "utf8mb4", collate:str ="utf8mb4_unicode_ci",drop_if_exists: bool = True):
c = pymysql.connect(host=self.host, user=self.user, password=self.password)
if drop_if_exists:
c.cursor().execute("DROP DATABASE IF EXISTS " + db_name)
c.cursor().execute("CREATE DATABASE " + db_name + " CHARACTER SET=" + charset + " COLLATE=" + collate)
c.close()
print("Database %s created with a %s charset" % (db_name, charset))
def establish(self: object, db_name: str, charset: str = "utf8mb4"):
self.conn = sqlalchemy.create_engine(
"mysql+pymysql://" + self.user + ":" + self.password + "#" + self.host + ":" + self.port + "/" + db_name +
"?charset=" + charset)
print("Connection with database : %s has been established as %s at %s." % (db_name, self.user, self.host))
print("Charset : %s" % charset)
def retrieve(self, table):
df = pd.read_sql_table(table, self.conn)
return df
def save(self: object, df: "Pandas.DataFrame", table: str, if_exists: str = "replace", chunksize: int = 10000):
df.to_sql(name=table, con=self.conn, if_exists=if_exists, chunksize=chunksize)
Some elements that might help:
Well, hex 92 and e9 is not valid utf8mb4 (UTF-8). Perhaps you were expecting ’été, assuming CHARACTER SETs cp1250, cp1256, cp1257, or latin1.
Find out where that text is coming from, and let's decide whether it is valid latin1. Then we can fix the code to declare that the client is really using latin1, not utf8mb4? Or we can fix the client to use UTF-8, which would probably be better in the long run.

Function does not return the list correctly

I have written a code for adding the numbers from two different text files. For a very big data 2-3 GB, I get the MemoryError. So, I am writing a new code using some functions to avoid loading the whole data into memory.
This code opens an input file 'd.txt' an reads the numbers after some lines from a bigger data as following:
SCALAR
ND 3
ST 0
TS 1000
1.0
1.0
1.0
SCALAR
ND 3
ST 0
TS 2000
3.3
3.4
3.5
SCALAR
ND 3
ST 0
TS 3000
1.7
1.8
1.9
and adds to the number have read from a smaller text file 'e.txt' as following:
SCALAR
ND 3
ST 0
TS 0
10.0
10.0
10.0
The result is written in a text file 'output.txt' like this:
SCALAR
ND 3
ST 0
TS 1000
11.0
11.0
11.0
SCALAR
ND 3
ST 0
TS 2000
13.3
13.4
13.5
SCALAR
ND 3
ST 0
TS 3000
11.7
11.8
11.9
The code which I prepared:
def add_list_same(list1, list2):
"""
list2 has the same size as list1
"""
c = [a+b for a, b in zip(list1, list2)]
print(c)
return c
def list_numbers_after_ts(n, f):
result = []
for line in f:
if line.startswith('TS'):
for node in range(n):
result.append(float(next(f)))
return result
def writing_TS(f1):
TS = []
ND = []
for line1 in f1:
if line1.startswith('ND'):
ND = float(line1.split()[-1])
if line1.startswith('TS'):
x = float(line1.split()[-1])
TS.append(x)
return TS, ND
with open('d.txt') as depth_dat_file, \
open('e.txt') as elev_file, \
open('output.txt', 'w') as out:
m = writing_TS(depth_dat_file)
print('number of TS', m[1])
for j in range(0,int(m[1])-1):
i = m[1]*j
out.write('SCALAR\nND {0:2f}\nST 0\nTS {0:2f}\n'.format(m[1], m[0][j]))
list1 = list_numbers_after_ts(int(m[1]), depth_dat_file)
list2 = list_numbers_after_ts(int(m[1]), elev_file)
Eh = add_list_same(list1, list2)
out.writelines(["%.2f\n" % item for item in Eh])
the output.txt is like this:
SCALAR
ND 3.000000
ST 0
TS 3.000000
SCALAR
ND 3.000000
ST 0
TS 3.000000
SCALAR
ND 3.000000
ST 0
TS 3.000000
The addition of lists does not work, besides I checked separately the functions, they work. I don't find the error. I changed it a lot, but it does not work. Any suggustion? I really appreciate any help you can provide!
You can use grouper to read files by fixed count of lines. Next code should works if order of lines in groups is unchanged.
from itertools import zip_longest
#Split by group iterator
#See http://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks
def grouper(iterable, n, padvalue=None):
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
add_numbers = []
with open("e.txt") as f:
# Read data by 7 lines
for lines in grouper(f, 7):
# Suppress first SCALAR line
for line in lines[1:]:
# add last number in every line to array (6 elements)
add_numbers.append(float(line.split()[-1].strip()))
#template for every group
template = 'SCALAR\nND {:.2f}\nST {:.2f}\nTS {:.2f}\n{:.2f}\n{:.2f}\n{:.2f}\n'
with open("d.txt") as f, open('output.txt', 'w') as out:
# As before
for lines in grouper(f, 7):
data_numbers = []
for line in lines[1:]:
data_numbers.append(float(line.split()[-1].strip()))
# in result_numbers sum elements of two arrays by pair (6 elements)
result_numbers = [x + y for x, y in zip(data_numbers, add_numbers)]
# * unpack result_numbers as 6 arguments of function format
out.write(template.format(*result_numbers))
I had to change some small things in the code and now it works but just for small input files, because many variables are loaded into memory. Can you please tell me how can I work with yield.
from itertools import zip_longest
def grouper(iterable, n, padvalue=None):
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
def writing_ND(f1):
for line1 in f1:
if line1.startswith('ND'):
ND = float(line1.split()[-1])
return ND
def writing_TS(f):
for line2 in f:
if line2.startswith('TS'):
x = float(line2.split()[-1])
TS.append(x)
return TS
TS = []
ND = []
x = 0.0
n = 0
add_numbers = []
with open("e.txt") as f, open("d.txt") as f1,\
open('output.txt', 'w') as out:
ND = writing_ND(f)
TS = writing_TS(f1)
n = int(ND)+4
f.seek(0)
for lines in grouper(f, int(n)):
for item in lines[4:]:
add_numbers.append(float(item))
i = 0
for l in grouper(f1, n):
data_numbers = []
for line in l[4:]:
data_numbers.append(float(line.split()[-1].strip()))
result_numbers = [x + y for x, y in zip(data_numbers, add_numbers)]
del data_numbers
out.write('SCALAR\nND %d\nST 0\nTS %0.2f\n' % (ND, TS[i]))
i += 1
for item in result_numbers:
out.write('%s\n' % item)

standard unambiguos format [R] MySQL imported data

OK, to set the scene, I have written a function to import multiple tables from MySQL (using RODBC) and run randomForest() on them.
This function is run on multiple databases (as separate instances).
In one particular database, and one particular table, the "error in as.POSIXlt.character(x, tz,.....): character string not in a standard unambiguous format" error is thrown. The function runs on around 150 tables across two databases without any issues except this one table.
Here is a head() print from the table:
MQLTime bar5 bar4 bar3 bar2 bar1 pat1 baXRC
1 2014-11-05 23:35:00 184 24 8 24 67 147 Flat
2 2014-11-05 23:57:00 203 184 204 67 51 147 Flat
3 2014-11-06 00:40:00 179 309 49 189 75 19 Flat
4 2014-11-06 00:46:00 28 192 60 49 152 147 Flat
5 2014-11-06 01:20:00 309 48 9 11 24 19 Flat
6 2014-11-06 01:31:00 24 177 64 152 188 19 Flat
And here is the function:
GenerateRF <- function(db, countstable, RFcutoff) {
'load required libraries'
library(RODBC)
library(randomForest)
library(caret)
library(ff)
library(stringi)
'connection and data preparation'
connection <- odbcConnect ('TTODBC', uid='root', pwd='password', case="nochange")
'import count table and check if RF is allowed to be built'
query.str <- paste0 ('select * from ', db, '.', countstable, ' order by RowCount asc')
row.counts <- sqlQuery (connection, query.str)
'Operate only on tables that have >= RFcutoff'
for (i in 1:nrow (row.counts)) {
table.name <- as.character (row.counts[i,1])
col.count <- as.numeric (row.counts[i,2])
row.count <- as.numeric (row.counts[i,3])
if (row.count >= 20) {
'Delete old RFs and DFs for input pattern'
if (file.exists (paste0 (table.name, '_RF.Rdata'))) {
file.remove (paste0 (table.name, '_RF.Rdata'))
}
if (file.exists (paste0 (table.name, '_DF.Rdata'))) {
file.remove (paste0 (table.name, '_DF.Rdata'))
}
'import and clean data'
query.str2 <- paste0 ('select * from ', db, '.', table.name, ' order by mqltime asc')
raw.data <- sqlQuery(connection, query.str2)
'partition data into training/test sets'
set.seed(489)
index <- createDataPartition(raw.data$baXRC, p=0.66, list=FALSE, times=1)
data.train <- raw.data [index,]
data.test <- raw.data [-index,]
'find optimal trees to grow (without outcome and dates)
data.mtry <- as.data.frame (tuneRF (data.train [, c(-1,-col.count)], data.train$baXRC, ntreetry=100,
stepFactor=.5, improve=0.01, trace=TRUE, plot=TRUE, dobest=FALSE))
best.mtry <- data.mtry [which (data.mtry[,2] == min (data.mtry[,2])), 1]
'compress df'
data.ff <- as.ffdf (data.train)
'run RF. Originally set to 1000 trees but M1 dataset is to large for laptop. Maybe train at the lab?'
data.rf <- randomForest (baXRC~., data=data.ff[,-1], mtry=best.mtry, ntree=500, keep.forest=TRUE,
importance=TRUE, proximity=FALSE)
'generate and print variable importance plot'
varImpPlot (data.rf, main = table.name)
'predict on test data'
data.test.pred <- as.data.frame( predict (data.rf, data.test, type="prob"))
'get dates and name date column'
data.test.dates <- data.frame (data.test[,1])
colnames (data.test.dates) <- 'MQLTime'
'attach dates to prediction df'
data.test.res <- cbind (data.test.dates, data.test.pred)
'force date coercion to attempt negating unambiguous format error '
data.test.res$MQLTime <- format(data.test.res$MQLTime, format = "%Y-%m-%d %H:%M:%S")
'delete row names, coerce to dataframe, generate row table name and export outcomes to MySQL'
rownames (data.test.res)<-NULL
data.test.res <- as.data.frame (data.test.res)
root.table <- stri_sub(table.name, 0, -5)
sqlUpdate (connection, data.test.res, tablename = paste0(db, '.', root.table, '_outcome'), index = "MQLTime")
'save RF and test df/s for future use; save latest version of row_counts to MQL4 folder'
save (data.rf, file = paste0 ("C:/Users/user/Documents/RF_test2/", table.name, '_RF.Rdata'))
save (data.test, file = paste0 ("C:/Users/user/Documents/RF_test2/", table.name, '_DF.Rdata'))
write.table (row.counts, paste0("C:/Users/user/AppData/Roaming/MetaQuotes/Terminal/71FA4710ABEFC21F77A62A104A956F23/MQL4/Files/", db, "_m1_rowcounts.csv"), sep = ",", col.names = F,
row.names = F, quote = F)
'end of conditional block'
}
'end of for loop'
}
'close all connection to MySQL'
odbcCloseAll()
'clear workspace'
rm(list=ls())
'end of function'
}
At this line:
data.test.res$MQLTime <- format(data.test.res$MQLTime, format = "%Y-%m-%d %H:%M:%S")
I have tried coercing MQLTime using various functions including: as.character(), as.POSIXct(), as.POSIXlt(), as.Date(), format(), as.character(as.Date())
and have also tried:
"%y" vs "%Y" and "%OS" vs "%S"
All variants seem to have no effect on the error and the function is still able to run on all other tables. I have checked the table manually (which contains almost 1500 rows) and also in MySQL looking for NULL dates or dates like "0000-00-00 00:00:00".
Also, if I run the function line by line in R terminal, this offending table is processed without any problems which just confuses the hell out me.
I've exhausted all the functions/solutions I can think of (and also all those I could find through Dr. Google) so I am pleading for help here.
I should probably mention that the MQLTime column is stored as varchar() in MySQL. This was done to try and get around issues with type conversions between R and MySQL
SHOW VARIABLES LIKE "%version%";
innodb_version, 5.6.19
protocol_version, 10
slave_type_conversions,
version, 5.6.19
version_comment, MySQL Community Server (GPL)
version_compile_machine, x86
version_compile_os, Win32
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
Edit: Str() output on the data as imported from MySQl showing MQLTime is already in POSIXct format:
> str(raw.data)
'data.frame': 1472 obs. of 8 variables:
$ MQLTime: POSIXct, format: "2014-11-05 23:35:00" "2014-11-05 23:57:00" "2014-11-06 00:40:00" "2014-11-06 00:46:00" ...
$ bar5 : int 184 203 179 28 309 24 156 48 309 437 ...
$ bar4 : int 24 184 309 192 48 177 48 68 60 71 ...
$ bar3 : int 8 204 49 60 9 64 68 27 192 147 ...
$ bar2 : int 24 67 189 49 11 152 27 56 437 67 ...
$ bar1 : int 67 51 75 152 24 188 56 147 71 0 ...
$ pat1 : int 147 147 19 147 19 19 147 19 147 19 ...
$ baXRC : Factor w/ 3 levels "Down","Flat",..: 2 2 2 2 2 2 2 2 2 3 ...
So I have tried declaring stringsAsfactors = FALSE in the dataframe operations and this had no effect.
Interestingly, if the offending table is removed from processing through an additional conditional statement in the first 'if' block, the function stops on the table immediately preceeding the blocked table.
If both the original and the new offending tables are removed from processing, then the function stops on the table immediately prior to them. I have never seen this sort of behavior before and it really has me stumped.
I watched system resources during the function and they never seem to max out.
Could this be a problem with the 'for' loop and not necessarily date formats?
There appears to be some egg on my face. The table following the table where the function was stopping had a row with value '0000-00-00 00:00:00'. I added another statement in my MySQL function to remove these rows when pre-processing the tables. Thanks to those that had a look at this.

Code Golf: Collatz Conjecture

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Inspired by http://xkcd.com/710/ here is a code golf for it.
The Challenge
Given a positive integer greater than 0, print out the hailstone sequence for that number.
The Hailstone Sequence
See Wikipedia for more detail..
If the number is even, divide it by two.
If the number is odd, triple it and add one.
Repeat this with the number produced until it reaches 1. (if it continues after 1, it will go in an infinite loop of 1 -> 4 -> 2 -> 1...)
Sometimes code is the best way to explain, so here is some from Wikipedia
function collatz(n)
show n
if n > 1
if n is odd
call collatz(3n + 1)
else
call collatz(n / 2)
This code works, but I am adding on an extra challenge. The program must not be vulnerable to stack overflows. So it must either use iteration or tail recursion.
Also, bonus points for if it can calculate big numbers and the language does not already have it implemented. (or if you reimplement big number support using fixed-length integers)
Test case
Number: 21
Results: 21 -> 64 -> 32 -> 16 -> 8 -> 4 -> 2 -> 1
Number: 3
Results: 3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
Also, the code golf must include full user input and output.
x86 assembly, 1337 characters
;
; To assemble and link this program, just run:
;
; >> $ nasm -f elf collatz.asm && gcc -o collatz collatz.o
;
; You can then enjoy its output by passing a number to it on the command line:
;
; >> $ ./collatz 123
; >> 123 --> 370 --> 185 --> 556 --> 278 --> 139 --> 418 --> 209 --> 628 --> 314
; >> --> 157 --> 472 --> 236 --> 118 --> 59 --> 178 --> 89 --> 268 --> 134 --> 67
; >> --> 202 --> 101 --> 304 --> 152 --> 76 --> 38 --> 19 --> 58 --> 29 --> 88
; >> --> 44 --> 22 --> 11 --> 34 --> 17 --> 52 --> 26 --> 13 --> 40 --> 20 --> 10
; >> --> 5 --> 16 --> 8 --> 4 --> 2 --> 1
;
; There's even some error checking involved:
; >> $ ./collatz
; >> Usage: ./collatz NUMBER
;
section .text
global main
extern printf
extern atoi
main:
cmp dword [esp+0x04], 2
jne .usage
mov ebx, [esp+0x08]
push dword [ebx+0x04]
call atoi
add esp, 4
cmp eax, 0
je .usage
mov ebx, eax
push eax
push msg
.loop:
mov [esp+0x04], ebx
call printf
test ebx, 0x01
jz .even
.odd:
lea ebx, [1+ebx*2+ebx]
jmp .loop
.even:
shr ebx, 1
cmp ebx, 1
jne .loop
push ebx
push end
call printf
add esp, 16
xor eax, eax
ret
.usage:
mov ebx, [esp+0x08]
push dword [ebx+0x00]
push usage
call printf
add esp, 8
mov eax, 1
ret
msg db "%d --> ", 0
end db "%d", 10, 0
usage db "Usage: %s NUMBER", 10, 0
Befunge
&>:.:1-|
>3*^ #
|%2: <
v>2/>+
LOLCODE: 406 CHARAKTERZ
HAI
BTW COLLATZ SOUNDZ JUS LULZ
CAN HAS STDIO?
I HAS A NUMBAR
BTW, I WANTS UR NUMBAR
GIMMEH NUMBAR
VISIBLE NUMBAR
IM IN YR SEQUENZ
MOD OF NUMBAR AN 2
BOTH SAEM IT AN 0, O RLY?
YA RLY, NUMBAR R QUOSHUNT OF NUMBAR AN 2
NO WAI, NUMBAR R SUM OF PRODUKT OF NUMBAR AN 3 AN 1
OIC
VISIBLE NUMBAR
DIFFRINT 2 AN SMALLR OF 2 AN NUMBAR, O RLY?
YA RLY, GTFO
OIC
IM OUTTA YR SEQUENZ
KTHXBYE
TESTD UNDR JUSTIN J. MEZA'S INTERPRETR. KTHXBYE!
Python - 95 64 51 46 char
Obviously does not produce a stack overflow.
n=input()
while n>1:n=(n/2,n*3+1)[n%2];print n
Perl
I decided to be a little anticompetitive, and show how you would normally code such problem in Perl.
There is also a 46 (total) char code-golf entry at the end.
These first three examples all start out with this header.
#! /usr/bin/env perl
use Modern::Perl;
# which is the same as these three lines:
# use 5.10.0;
# use strict;
# use warnings;
while( <> ){
chomp;
last unless $_;
Collatz( $_ );
}
Simple recursive version
use Sub::Call::Recur;
sub Collatz{
my( $n ) = #_;
$n += 0; # ensure that it is numeric
die 'invalid value' unless $n > 0;
die 'Integer values only' unless $n == int $n;
say $n;
given( $n ){
when( 1 ){}
when( $_ % 2 != 0 ){ # odd
recur( 3 * $n + 1 );
}
default{ # even
recur( $n / 2 );
}
}
}
Simple iterative version
sub Collatz{
my( $n ) = #_;
$n += 0; # ensure that it is numeric
die 'invalid value' unless $n > 0;
die 'Integer values only' unless $n == int $n;
say $n;
while( $n > 1 ){
if( $n % 2 ){ # odd
$n = 3 * $n + 1;
} else { #even
$n = $n / 2;
}
say $n;
}
}
Optimized iterative version
sub Collatz{
my( $n ) = #_;
$n += 0; # ensure that it is numeric
die 'invalid value' unless $n > 0;
die 'Integer values only' unless $n == int $n;
#
state #next;
$next[1] //= 0; # sets $next[1] to 0 if it is undefined
#
# fill out #next until we get to a value we've already worked on
until( defined $next[$n] ){
say $n;
#
if( $n % 2 ){ # odd
$next[$n] = 3 * $n + 1;
} else { # even
$next[$n] = $n / 2;
}
#
$n = $next[$n];
}
say $n;
# finish running until we get to 1
say $n while $n = $next[$n];
}
Now I'm going to show how you would do that last example with a version of Perl prior to v5.10.0
#! /usr/bin/env perl
use strict;
use warnings;
while( <> ){
chomp;
last unless $_;
Collatz( $_ );
}
{
my #next = (0,0); # essentially the same as a state variable
sub Collatz{
my( $n ) = #_;
$n += 0; # ensure that it is numeric
die 'invalid value' unless $n > 0;
# fill out #next until we get to a value we've already worked on
until( $n == 1 or defined $next[$n] ){
print $n, "\n";
if( $n % 2 ){ # odd
$next[$n] = 3 * $n + 1;
} else { # even
$next[$n] = $n / 2;
}
$n = $next[$n];
}
print $n, "\n";
# finish running until we get to 1
print $n, "\n" while $n = $next[$n];
}
}
Benchmark
First off the IO is always going to be the slow part. So if you actually benchmarked them as-is you should get about the same speed out of each one.
To test these then, I opened a file handle to /dev/null ($null), and edited every say $n to instead read say {$null} $n. This is to reduce the dependence on IO.
#! /usr/bin/env perl
use Modern::Perl;
use autodie;
open our $null, '>', '/dev/null';
use Benchmark qw':all';
cmpthese( -10,
{
Recursive => sub{ Collatz_r( 31 ) },
Iterative => sub{ Collatz_i( 31 ) },
Optimized => sub{ Collatz_o( 31 ) },
});
sub Collatz_r{
...
say {$null} $n;
...
}
sub Collatz_i{
...
say {$null} $n;
...
}
sub Collatz_o{
...
say {$null} $n;
...
}
After having run it 10 times, here is a representative sample output:
Rate Recursive Iterative Optimized
Recursive 1715/s -- -27% -46%
Iterative 2336/s 36% -- -27%
Optimized 3187/s 86% 36% --
Finally, a real code-golf entry:
perl -nlE'say;say$_=$_%2?3*$_+1:$_/2while$_>1'
46 chars total
If you don't need to print the starting value, you could remove 5 more characters.
perl -nE'say$_=$_%2?3*$_+1:$_/2while$_>1'
41 chars total
31 chars for the actual code portion, but the code won't work without the -n switch. So I include the entire example in my count.
Haskell, 62 chars 63 76 83, 86, 97, 137
c 1=[1]
c n=n:c(div(n`mod`2*(5*n+2)+n)2)
main=readLn>>=print.c
User input, printed output, uses constant memory and stack, works with arbitrarily big integers.
A sample run of this code, given an 80 digit number of all '1's (!) as input, is pretty fun to look at.
Original, function only version:
Haskell 51 chars
f n=n:[[],f([n`div`2,3*n+1]!!(n`mod`2))]!!(1`mod`n)
Who the #&^# needs conditionals, anyway?
(edit: I was being "clever" and used fix. Without it, the code dropped to 54 chars.
edit2: dropped to 51 by factoring out f())
Golfscript : 20 chars
~{(}{3*).1&5*)/}/1+`
#
# Usage: echo 21 | ruby golfscript.rb collatz.gs
This is equivalent to
stack<int> s;
s.push(21);
while (s.top() - 1) {
int x = s.top();
int numerator = x*3+1;
int denominator = (numerator&1) * 5 + 1;
s.push(numerator/denominator);
}
s.push(1);
return s;
bc 41 chars
I guess this kind of problems is what bc was invented for:
for(n=read();n>1;){if(n%2)n=n*6+2;n/=2;n}
Test:
bc1 -q collatz.bc
21
64
32
16
8
4
2
1
Proper code:
for(n=read();n>1;){if(n%2)n=n*3+1else n/=2;print n,"\n"}
bc handles numbers with up to INT_MAX digits
Edit: The Wikipedia article mentions this conjecture has been checked for all values up to 20x258 (aprox. 5.76e18). This program:
c=0;for(n=2^20000+1;n>1;){if(n%2)n=n*6+2;n/=2;c+=1};n;c
tests 220,000+1 (aprox. 3.98e6,020) in 68 seconds, 144,404 cycles.
Perl : 31 chars
perl -nE 'say$_=$_%2?$_*3+1:$_/2while$_>1'
# 123456789 123456789 123456789 1234567
Edited to remove 2 unnecessary spaces.
Edited to remove 1 unnecessary space.
MS Excel, 35 chars
=IF(A1/2=ROUND(A1/2,0),A1/2,A1*3+1)
Taken straight from Wikipedia:
In cell A1, place the starting number.
In cell A2 enter this formula =IF(A1/2=ROUND(A1/2,0),A1/2,A1*3+1)
Drag and copy the formula down until 4, 2, 1
It only took copy/pasting the formula 111 times to get the result for a starting number of 1000. ;)
C : 64 chars
main(x){for(scanf("%d",&x);x>=printf("%d,",x);x=x&1?3*x+1:x/2);}
With big integer support: 431 (necessary) chars
#include <stdlib.h>
#define B (w>=m?d=realloc(d,m=m+m):0)
#define S(a,b)t=a,a=b,b=t
main(m,w,i,t){char*d=malloc(m=9);for(w=0;(i=getchar()+2)/10==5;)
B,d[w++]=i%10;for(i=0;i<w/2;i++)S(d[i],d[w-i-1]);for(;;w++){
while(w&&!d[w-1])w--;for(i=w+1;i--;)putchar(i?d[i-1]+48:10);if(
w==1&&*d==1)break;if(*d&1){for(i=w;i--;)d[i]*=3;*d+=1;}else{
for(i=w;i-->1;)d[i-1]+=d[i]%2*10,d[i]/=2;*d/=2;}B,d[w]=0;for(i=0
;i<w;i++)d[i+1]+=d[i]/10,d[i]%=10;}}
Note: Do not remove #include <stdlib.h> without at least prototyping malloc/realloc, as doing so will not be safe on 64-bit platforms (64-bit void* will be converted to 32-bit int).
This one hasn't been tested vigorously yet. It could use some shortening as well.
Previous versions:
main(x){for(scanf("%d",&x);printf("%d,",x),x-1;x=x&1?3*x+1:x/2);} // 66
(removed 12 chars because no one follows the output format... :| )
Another assembler version. This one is not limited to 32 bit numbers, it can handle numbers up to 1065534 although the ".com" format MS-DOS uses is limited to 80 digit numbers. Written for A86 assembler and requires a Win-XP DOS box to run. Assembles to 180 bytes:
mov ax,cs
mov si,82h
add ah,10h
mov es,ax
mov bh,0
mov bl,byte ptr [80h]
cmp bl,1
jbe ret
dec bl
mov cx,bx
dec bl
xor di,di
p1:lodsb
sub al,'0'
cmp al,10
jae ret
stosb
loop p1
xor bp,bp
push es
pop ds
p2:cmp byte ptr ds:[bp],0
jne p3
inc bp
jmp p2
ret
p3:lea si,[bp-1]
cld
p4:inc si
mov dl,[si]
add dl,'0'
mov ah,2
int 21h
cmp si,bx
jne p4
cmp bx,bp
jne p5
cmp byte ptr [bx],1
je ret
p5:mov dl,'-'
mov ah,2
int 21h
mov dl,'>'
int 21h
test byte ptr [bx],1
jz p10
;odd
mov si,bx
mov di,si
mov dx,3
dec bp
std
p6:lodsb
mul dl
add al,dh
aam
mov dh,ah
stosb
cmp si,bp
jnz p6
or dh,dh
jz p7
mov al,dh
stosb
dec bp
p7:mov si,bx
mov di,si
p8:lodsb
inc al
xor ah,ah
aaa
stosb
or ah,ah
jz p9
cmp si,bp
jne p8
mov al,1
stosb
jmp p2
p9:inc bp
jmp p2
p10:mov si,bp
mov di,bp
xor ax,ax
p11:lodsb
test ah,1
jz p12
add al,10
p12:mov ah,al
shr al,1
cmp di,bx
stosb
jne p11
jmp p2
dc - 24 chars 25 28
dc is a good tool for this sequence:
?[d5*2+d2%*+2/pd1<L]dsLx
dc -f collatz.dc
21
64
32
16
8
4
2
1
Also 24 chars using the formula from the Golfscript entry:
?[3*1+d2%5*1+/pd1<L]dsLx
57 chars to meet the specs:
[Number: ]n?[Results: ]ndn[d5*2+d2%*+2/[ -> ]ndnd1<L]dsLx
dc -f collatz-spec.dc
Number: 3
Results: 3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
Scheme: 72
(define(c n)(if(= n 1)`(1)(cons n(if(odd? n)(c(+(* n 3)1))(c(/ n 2))))))
This uses recursion, but the calls are tail-recursive so I think they'll be optimized to iteration. In some quick testing, I haven't been able to find a number for which the stack overflows anyway. Just for example:
(c 9876543219999999999000011234567898888777766665555444433332222
7777777777777777777777777777777798797657657651234143375987342987
5398709812374982529830983743297432985230985739287023987532098579
058095873098753098370938753987)
...runs just fine. [that's all one number -- I've just broken it to fit on screen.]
Mathematica, 45 50 chars
c=NestWhileList[If[OddQ##,3#+1,#/2]&,#,#>1&]&
Ruby, 50 chars, no stack overflow
Basically a direct rip of makapuf's Python solution:
def c(n)while n>1;n=n.odd?? n*3+1: n/2;p n end end
Ruby, 45 chars, will overflow
Basically a direct rip of the code provided in the question:
def c(n)p n;n.odd?? c(3*n+1):c(n/2)if n>1 end
import java.math.BigInteger;
public class SortaJava {
static final BigInteger THREE = new BigInteger("3");
static final BigInteger TWO = new BigInteger("2");
interface BiFunc<R, A, B> {
R call(A a, B b);
}
interface Cons<A, B> {
<R> R apply(BiFunc<R, A, B> func);
}
static class Collatz implements Cons<BigInteger, Collatz> {
BigInteger value;
public Collatz(BigInteger value) { this.value = value; }
public <R> R apply(BiFunc<R, BigInteger, Collatz> func) {
if(BigInteger.ONE.equals(value))
return func.call(value, null);
if(value.testBit(0))
return func.call(value, new Collatz((value.multiply(THREE)).add(BigInteger.ONE)));
return func.call(value, new Collatz(value.divide(TWO)));
}
}
static class PrintAReturnB<A, B> implements BiFunc<B, A, B> {
boolean first = true;
public B call(A a, B b) {
if(first)
first = false;
else
System.out.print(" -> ");
System.out.print(a);
return b;
}
}
public static void main(String[] args) {
BiFunc<Collatz, BigInteger, Collatz> printer = new PrintAReturnB<BigInteger, Collatz>();
Collatz collatz = new Collatz(new BigInteger(args[0]));
while(collatz != null)
collatz = collatz.apply(printer);
}
}
Python 45 Char
Shaved a char off of makapuf's answer.
n=input()
while~-n:n=(n/2,n*3+1)[n%2];print n
TI-BASIC
Not the shortest, but a novel approach. Certain to slow down considerably with large sequences, but it shouldn't overflow.
PROGRAM:COLLATZ
:ClrHome
:Input X
:Lbl 1
:While X≠1
:If X/2=int(X/2)
:Then
:Disp X/2→X
:Else
:Disp X*3+1→X
:End
:Goto 1
:End
Haskell : 50
c 1=[1];c n=n:(c$if odd n then 3*n+1 else n`div`2)
not the shortest, but an elegant clojure solution
(defn collatz [n]
(print n "")
(if (> n 1)
(recur
(if (odd? n)
(inc (* 3 n))
(/ n 2)))))
C#: 216 Characters
using C=System.Console;class P{static void Main(){var p="start:";System.Action<object> o=C.Write;o(p);ulong i;while(ulong.TryParse(C.ReadLine(),out i)){o(i);while(i > 1){i=i%2==0?i/2:i*3+1;o(" -> "+i);}o("\n"+p);}}}
in long form:
using C = System.Console;
class P
{
static void Main()
{
var p = "start:";
System.Action<object> o = C.Write;
o(p);
ulong i;
while (ulong.TryParse(C.ReadLine(), out i))
{
o(i);
while (i > 1)
{
i = i % 2 == 0 ? i / 2 : i * 3 + 1;
o(" -> " + i);
}
o("\n" + p);
}
}
}
New Version, accepts one number as input provided through the command line, no input validation. 173 154 characters.
using System;class P{static void Main(string[]a){Action<object>o=Console.Write;var i=ulong.Parse(a[0]);o(i);while(i>1){i=i%2==0?i/2:i*3+1;o(" -> "+i);}}}
in long form:
using System;
class P
{
static void Main(string[]a)
{
Action<object>o=Console.Write;
var i=ulong.Parse(a[0]);
o(i);
while(i>1)
{
i=i%2==0?i/2:i*3+1;
o(" -> "+i);
}
}
}
I am able to shave a few characters by ripping off the idea in this answer to use a for loop rather than a while. 150 characters.
using System;class P{static void Main(string[]a){Action<object>o=Console.Write;for(var i=ulong.Parse(a[0]);i>1;i=i%2==0?i/2:i*3+1)o(i+" -> ");o(1);}}
Ruby, 43 characters
bignum supported, with stack overflow susceptibility:
def c(n)p n;n%2>0?c(3*n+1):c(n/2)if n>1 end
...and 50 characters, bignum supported, without stack overflow:
def d(n)while n>1 do p n;n=n%2>0?3*n+1:n/2 end end
Kudos to Jordan. I didn't know about 'p' as a replacement for puts.
nroff1
Run with nroff -U hail.g
.warn
.pl 1
.pso (printf "Enter a number: " 1>&2); read x; echo .nr x $x
.while \nx>1 \{\
. ie \nx%2 .nr x \nx*3+1
. el .nr x \nx/2
\nx
.\}
1. groff version
Scala + Scalaz
import scalaz._
import Scalaz._
val collatz =
(_:Int).iterate[Stream](a=>Seq(a/2,3*a+1)(a%2)).takeWhile(1<) // This line: 61 chars
And in action:
scala> collatz(7).toList
res15: List[Int] = List(7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2)
Scala 2.8
val collatz =
Stream.iterate(_:Int)(a=>Seq(a/2,3*a+1)(a%2)).takeWhile(1<) :+ 1
This also includes the trailing 1.
scala> collatz(7)
res12: scala.collection.immutable.Stream[Int] = Stream(7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1)
With the following implicit
implicit def intToEven(i:Int) = new {
def ~(even: Int=>Int, odd: Int=>Int) = {
if (i%2==0) { even(i) } else { odd(i) }
}
}
this can be shortened to
val collatz = Stream.iterate(_:Int)(_~(_/2,3*_+1)).takeWhile(1<) :+ 1
Edit - 58 characters (including input and output, but not including initial number)
var n=readInt;while(n>1){n=Seq(n/2,n*3+1)(n%2);println(n)}
Could be reduced by 2 if you don't need newlines...
F#, 90 characters
let c=Seq.unfold(function|n when n<=1->None|n when n%2=0->Some(n,n/2)|n->Some(n,(3*n)+1))
> c 21;;
val it : seq<int> = seq [21; 64; 32; 16; ...]
Or if you're not using F# interactive to display the result, 102 characters:
let c=Seq.unfold(function|n when n<=1->None|n when n%2=0->Some(n,n/2)|n->Some(n,(3*n)+1))>>printf"%A"
Common Lisp, 141 characters:
(defun c ()
(format t"Number: ")
(loop for n = (read) then (if(oddp n)(+ 1 n n n)(/ n 2))
until (= n 1)
do (format t"~d -> "n))
(format t"1~%"))
Test run:
Number: 171
171 -> 514 -> 257 -> 772 -> 386 -> 193 -> 580 -> 290 -> 145 -> 436 ->
218 -> 109 -> 328 -> 164 -> 82 -> 41 -> 124 -> 62 -> 31 -> 94 -> 47 ->
142 -> 71 -> 214 -> 107 -> 322 -> 161 -> 484 -> 242 -> 121 -> 364 ->
182 -> 91 -> 274 -> 137 -> 412 -> 206 -> 103 -> 310 -> 155 -> 466 ->
233 -> 700 -> 350 -> 175 -> 526 -> 263 -> 790 -> 395 -> 1186 -> 593 ->
1780 -> 890 -> 445 -> 1336 -> 668 -> 334 -> 167 -> 502 -> 251 -> 754 ->
377 -> 1132 -> 566 -> 283 -> 850 -> 425 -> 1276 -> 638 -> 319 ->
958 -> 479 -> 1438 -> 719 -> 2158 -> 1079 -> 3238 -> 1619 -> 4858 ->
2429 -> 7288 -> 3644 -> 1822 -> 911 -> 2734 -> 1367 -> 4102 -> 2051 ->
6154 -> 3077 -> 9232 -> 4616 -> 2308 -> 1154 -> 577 -> 1732 -> 866 ->
433 -> 1300 -> 650 -> 325 -> 976 -> 488 -> 244 -> 122 -> 61 -> 184 ->
92 -> 46 -> 23 -> 70 -> 35 -> 106 -> 53 -> 160 -> 80 -> 40 -> 20 ->
10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1
The program frm Jerry Coffin has integer over flow, try this one:
#include <iostream>
int main(unsigned long long i)
{
int j = 0;
for( std::cin>>i; i>1; i = i&1? i*3+1:i/2, ++j)
std::cout<<i<<" -> ";
std::cout<<"\n"<<j << " iterations\n";
}
tested with
The number less than 100 million with the longest total stopping time is 63,728,127, with 949 steps.
The number less than 1 billion with the longest total stopping time is 670,617,279, with 986 steps.
ruby, 43, possibly meeting the I/O requirement
Run with ruby -n hail
n=$_.to_i
(n=n%2>0?n*3+1: n/2
p n)while n>1
C# : 659 chars with BigInteger support
using System.Linq;using C=System.Console;class Program{static void Main(){var v=C.ReadLine();C.Write(v);while(v!="1"){C.Write("->");if(v[v.Length-1]%2==0){v=v.Aggregate(new{s="",o=0},(r,c)=>new{s=r.s+(char)((c-48)/2+r.o+48),o=(c%2)*5}).s.TrimStart('0');}else{var q=v.Reverse().Aggregate(new{s="",o=0},(r, c)=>new{s=(char)((c-48)*3+r.o+(c*3+r.o>153?c*3+r.o>163?28:38:48))+r.s,o=c*3+r.o>153?c*3+r.o>163?2:1:0});var t=(q.o+q.s).TrimStart('0').Reverse();var x=t.First();q=t.Skip(1).Aggregate(new{s=x>56?(x-57).ToString():(x-47).ToString(),o=x>56?1:0},(r,c)=>new{s=(char)(c-48+r.o+(c+r.o>57?38:48))+r.s,o=c+r.o>57?1:0});v=(q.o+q.s).TrimStart('0');}C.Write(v);}}}
Ungolfed
using System.Linq;
using C = System.Console;
class Program
{
static void Main()
{
var v = C.ReadLine();
C.Write(v);
while (v != "1")
{
C.Write("->");
if (v[v.Length - 1] % 2 == 0)
{
v = v
.Aggregate(
new { s = "", o = 0 },
(r, c) => new { s = r.s + (char)((c - 48) / 2 + r.o + 48), o = (c % 2) * 5 })
.s.TrimStart('0');
}
else
{
var q = v
.Reverse()
.Aggregate(
new { s = "", o = 0 },
(r, c) => new { s = (char)((c - 48) * 3 + r.o + (c * 3 + r.o > 153 ? c * 3 + r.o > 163 ? 28 : 38 : 48)) + r.s, o = c * 3 + r.o > 153 ? c * 3 + r.o > 163 ? 2 : 1 : 0 });
var t = (q.o + q.s)
.TrimStart('0')
.Reverse();
var x = t.First();
q = t
.Skip(1)
.Aggregate(
new { s = x > 56 ? (x - 57).ToString() : (x - 47).ToString(), o = x > 56 ? 1 : 0 },
(r, c) => new { s = (char)(c - 48 + r.o + (c + r.o > 57 ? 38 : 48)) + r.s, o = c + r.o > 57 ? 1 : 0 });
v = (q.o + q.s)
.TrimStart('0');
}
C.Write(v);
}
}
}