I have a very large .tsv file (80 GB) that I need to edit. It is made up of 5 columns. The last column represent a score. Some positions have multiple "Score" entries and I need to keep only the row for each position with the highest value.
For example, this position have multiple entries for each combination:
1 861265 C A 0.071
1 861265 C A 0.148
1 861265 C G 0.001
1 861265 C G 0.108
1 861265 C T 0
1 861265 C T 0.216
2 193456 G A 0.006
2 193456 G A 0.094
2 193456 G C 0.011
2 193456 G C 0.152
2 193456 G T 0.003
2 193456 G T 0.056
The desired output would look like this:
1 861265 C A 0.148
1 861265 C G 0.108
1 861265 C T 0.216
2 193456 G A 0.094
2 193456 G C 0.152
2 193456 G T 0.056
Doing it in python/pandas is not possible as the file is too large or takes too long. Therefore, I am looking for a solution using bash; in particular awk.
Thif input file has been sorted with the following command:
sort -t$'\t' -k1 -n -o sorted_file original_file
The command would basically need to:
compare the data from the first 4 columns in the sorted_file
if all of those are the same, then only the row with the highest value on column 5 should be printed onto the output file`.
I am not very familiar with awk syntax. I have seen relatively similar questions in other forums, but I was unable to adapt it to my particular case. I have tried to adapt one of those solutions to my case like this:
awk -F, 'NR==1 {print; next} NR==2 {key=$2; next}$2 != key {print lastval; key = $2} {lastval = $0} END {print lastval}' sorted_files.tsv > filtered_file.tsv
However, the output file does not look like it should, at all.
Any help would be very much appreciated.
a more robust way is to sort the last field numerically and let awk pick the first value. If your fields don't have spaces, no need to specify the delimiter.
$ sort -k1n k5,5nr original_file | awk '!a[$1,$2,$3,$4]++' > max_value_file
As #Fravadona commented, since this stores the keys, if there are many unique records it will have large memory footprint. One alterative is delegating to uniq to pick the first record over repeated entries.
$ sort -k1n k5,5nr original_file |
awk '{print $5,$1,$2,$3,$4}' |
uniq -f1 |
awk '{print $2,$3,$4,$5,$1}'
we change the order of the fields to skip the value for comparison and then change back afterwards. This won't have any memory footprint (aside from sort, which will be managed).
If you're not a purist, this should work the same as the previous one
$ sort -k1n k5,5nr original_file | rev | uniq -f1 | rev
It's not awk, but using Miller, is very easy and interesting
mlr --tsv -N sort -f 1,2,3,4 -n 5 then top -f 5 -g 1,2,3,4 -a input.tsv >output.tsv
You will have
1 861265 C A 1 0.148
1 861265 C G 1 0.108
1 861265 C T 1 0.216
2 193456 G A 1 0.094
2 193456 G C 1 0.152
2 193456 G T 1 0.056
You can try this approach. This also works on a non-sorted last column, only the first 4 columns have to be sorted.
% awk 'NR>1&&str!=$1" "$2" "$3" "$4{print line; m=0}
$5>=m{m=$5; line=$0}
{str=$1" "$2" "$3" "$4} END{print line}' file
1 861265 C A 0.148
1 861265 C G 0.108
1 861265 C T 0.216
2 193456 G A 0.094
2 193456 G C 0.152
2 193456 G T 0.056
Data
% cat file
1 861265 C A 0.071
1 861265 C A 0.148
1 861265 C G 0.001
1 861265 C G 0.108
1 861265 C T 0
1 861265 C T 0.216
2 193456 G A 0.006
2 193456 G A 0.094
2 193456 G C 0.011
2 193456 G C 0.152
2 193456 G T 0.003
2 193456 G T 0.056
Assumptions/Understandings:
file is sorted by the first field
no guarantee on the ordering of fields #2, #3 and #4
must maintain the current row ordering (this would seem to rule out (re)sorting the file as we could lose the current row ordering)
the complete set of output rows for a given group will fit into memory (aka the awk arrays)
General plan:
we'll call field #1 the group field; all rows with the same value in field #1 are considered part of the same group
for a given group we keep track of all output rows via the awk array arr[] (index will be a combo of fields #2, #3, #4)
we also keep track of the incoming row order via the awk array order[]
update arr[] if we see a value in field #5 that's higher than the previous value
when group changes flush the current contents of the arr[] index to stdout
One awk idea:
awk '
function flush() { # function to flush current group to stdout
for (i=1; i<=seq; i++)
print group,order[i],arr[order[i]]
delete arr # reset arrays
delete order
seq=0 # reset index for order[] array
}
BEGIN { FS=OFS="\t" }
$1!=group { flush()
group=$1
}
{ key=$2 OFS $3 OFS $4
if ( key in arr && $5 <= arr[key] )
next
if ( ! (key in arr) )
order[++seq]=key
arr[key]=$5
}
END { flush() } # flush last group to stdout
' input.dat
This generates:
1 861265 C A 0.148
1 861265 C G 0.108
1 861265 C T 0.216
2 193456 G A 0.094
2 193456 G C 0.152
2 193456 G T 0.056
Updated
Extract from the sort manual:
-k, --key=KEYDEF
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character position in the field; both are origin 1, and the stop position defaults to the line's end.
It means that by using sort -t$'\t' -k1 -n like you did, all the fields of the file have contributed to the numerical sorting.
Here's probably the fastest awk solution that makes use of the numerical sorting in ascending order:
awk '
BEGIN {
FS = "\t"
if ((getline line) > 0) {
split(line, arr)
prev_key = arr[1] FS arr[2] FS arr[4]
prev_line = $0
}
}
{
curr_key = $1 FS $2 FS $4
if (curr_key != prev_key) {
print prev_line
prev_key = curr_key
}
prev_line = $0
}
END {
if (prev_key) print prev_line
}
' file.tsv
Note: As you're handling a file that has around 4 billions of lines, I tried to keep the number of operations to a minimum. For example:
Saving 80 billions operations just by setting FS to "\t". Indeed, why would you allow awk to compare each character of the file with " " when you're dealing with a TSV?
Saving 4 billions comparisons by processing the first line with getline in the BEGIN block. Some people might say that it's safer/better/cleaner to use (NR == 1) and/or (NR > 1), but that would mean doing 2 comparisons per input line instead of 0.
It may be worth to compare the execution time of this code with #EdMorton's code that uses the same algorithm without those optimisations. The disk speed will probably flatten the difference though ^^
Assuming your real input is sorted by key then ascending value the same way as your example is:
$ cat tst.awk
{ key = $1 FS $2 FS $3 FS $4 }
key != prevKey {
if ( NR > 1 ) {
print prevRec
}
prevKey = key
}
{ prevRec = $0 }
END {
print prevRec
}
$ awk -f tst.awk file
1 861265 C A 0.148
1 861265 C G 0.108
1 861265 C T 0.216
2 193456 G A 0.094
2 193456 G C 0.152
2 193456 G T 0.056
if your data isn't already sorted then just sort it with:
sort file | awk ..
That way only sort has to handle the whole file at once and it's designed to do so by using demand paging, etc. and so is far less likely to run out of memory than if you read the whole file into awk or python or any other tool
With sort and awk:
sort -t$'\t' -k1,1n -k4,4 -k5,5rn file | awk 'BEGIN{FS=OFS="\t"} !seen[$1,$4]++'
Prints:
1 861265 C A 0.148
1 861265 C G 0.108
1 861265 C T 0.216
2 193456 G A 0.094
2 193456 G C 0.152
2 193456 G T 0.056
This assumes the 'group' is defined as column 1.
Works by grouping first by column 1, then by column 4 (each letter) then reverse numeric sort on column 5.
The awk then prints the first group, letter seen which will be the max based on the sort.
I have a text column where the data is being automatically populated from a machine. Below is the data format which is being populated at the database. The format of the data is almost same for all records which I have in this table.
==========================
=== S U P E R S O N Y ===
========================
START AT 12:16:29A
ON 02-18-19
MACHINE COUNT 1051
OPERATOR ______________
SERIAL # 0777218-15
V=inHg
- TIME T=F P=psig
------------------------
D 12:16:31A 104.6 0.0P
D 12:16:41A 104.1 0.0P
D 12:26:41A 167.2 28.7V
D 12:31:41A 108.1 28.5V
MACHINE VALUE IS:
1.5 mg/min
L 12:41:41A 95.1 28.4V
L 12:43:54A 97.2 1.9V
Z 12:45:23A 97.5 0.0P
========================
= CHECK COMPLETE =
========================
I need to find the exact value after the "MACHINE VALUE IS:" and before "mg/min" word. In the above case, the query must return "1.5". The query I have written is failing because of some spaces after "MACHINE VALUE IS:" word.
SELECT REPLACE(REPLACE(SUBSTRING(contents,
LOCATE('IS:', contents), 10),'IS:', ''),'mg','') as value from machine_content
This will get you the number behind IS:<br> of yourse, uin your case you must change, to what ever breaks the line like cr lf
SET #sql := '==========================
=== S U P E R S O N Y ===
========================
START AT 12:16:29A
ON 02-18-19
MACHINE COUNT 1051
OPERATOR ______________
SERIAL # 0777218-15
V=inHg
- TIME T=F P=psig
------------------------
D 12:16:31A 104.6 0.0P
D 12:16:41A 104.1 0.0P
D 12:26:41A 167.2 28.7V
D 12:31:41A 108.1 28.5V
MACHINE VALUE IS:
1.5 mg/min
L 12:41:41A 95.1 28.4V
L 12:43:54A 97.2 1.9V
Z 12:45:23A 97.5 0.0P
========================
= CHECK COMPLETE =
========================
'
SELECT REPLACE(SUBSTRING_INDEX(SUBSTRING_INDEX(#sql, 'IS:', -1), 'mg', 1),'\n','')
| REPLACE(SUBSTRING_INDEX(SUBSTRING_INDEX(#sql, 'IS:', -1), 'mg', 1),'\n','') |
| :-------------------------------------------------------------------------- |
| 1.5 |
db<>fiddle here
I have JSON files that are annotated with comments that I strip out before doing operations using jq. I just hit an interesting problem in which I received a JSON file with comment annotations that included some rich-text quote characters (hex 93 and hex 94). My existing sed dot . character did not match these characters. Here is a demonstration:
First, the input:
% echo -e '# \x93text\x94\n{"a":1}' | od -c
0000000 # 223 t e x t 224 \n { " a " : 1 }
0000020 \n
0000021
%
And here is the transform:
% echo -e '# \x93text\x94\n{"a":1}' | sed 's/^\s*#.*//' | od -c
0000000 223 t e x t 224 \n { " a " : 1 } \n
0000017
%
Note that the dot character in the sed expression is not matching the hex 93 character. However, if I include LC_ALL=C:
% echo -e '# \x93text\x94\n{"a":1}' | LC_ALL=C sed 's/^\s*#.*//' | od -c
0000000 \n { " a " : 1 } \n
0000011
%
then the dot character in the sed expression does match the hex 93 and hex 94 characters. The sed documentation section Locale Considerations speaks of bracket expressions, but the behavior above seems to prove that this problem happens elsewhere.
It is interesting to note that deletion instead of substitution didn't show this problem:
% echo -e '# \x93text\x94\n{"a":1}' | sed '/^\s*#.*/d' | od -c
0000000 { " a " : 1 } \n
0000010
Given that I'm operating on annotated JSON files, I think the solution of adding LC_ALL=C to sed statements is reasonable.
So, my question: Is using LC_ALL=C something that I always want to use when doing non-locale-specific sed transformations (as would be applicable in annotated JSON files)? If not, what alternatives exist to avoid the problem I've shown above?
My environment:
CentOS 7.3 [kernel-3.10.0-514.6.1.el7.x86_64]
sed (GNU sed) 4.2.2 [sed-4.2.2-5.el7.x86_64]
Bash 4.2.46(1) [bash-4.2.46-21.el7_3.x86_64]
The C locale is a special locale that is meant to be the simplest locale. You could also say that while the other locales are for humans, the C locale is for computers. In the C locale, characters are single bytes, the charset is ASCII
On some systems, there's a difference with the POSIX locale where for instance the sort order for non-ASCII characters is not defined.
so LC_ALL=C is the secure way to take that non 8th bit character into account.
see comparaison
with LC, sed count as part of the character
echo -e '# \x93text\x94\n{"a":1}' | LC_ALL=C sed 's/[^[:alnum:]]/[HERE:&] /g' | od -c
0000000 [ H E R E : # ] [ H E R E :
0000020 ] [ H E R E : 223 ] t e x t [
0000040 H E R E : 224 ] \n [ H E R E : {
0000060 ] [ H E R E : " ] a [ H E R
0000100 E : " ] [ H E R E : : ] 1 [
0000120 H E R E : } ] \n
without LC, sed is not counted as part of the character to take into account ([[:alnum:]] and [^[:alnum:]] don't see 8th bit char)
echo -e '# \x93text\x94\n{"a":1}' | sed 's/[[:alnum:]]/[HERE:&] /g' | od -c
0000000 # 223 [ H E R E : t ] [ H E R
0000020 E : e ] [ H E R E : x ] [ H
0000040 E R E : t ] 224 \n { " [ H E R E
0000060 : a ] " : [ H E R E : 1 ] }
0000100 \n
echo -e '# \x93text\x94\n{"a":1}' | sed 's/[^[:alnum:]]/[HERE:&] /g' | od -c
0000000 [ H E R E : # ] [ H E R E :
0000020 ] 223 t e x t 224 \n [ H E R E : {
0000040 ] [ H E R E : " ] a [ H E R
0000060 E : " ] [ H E R E : : ] 1 [
0000100 H E R E : } ] \n
i made a long post but I'll keep it simpler.
Can someone show me a step by step for -10+-10 in hexadecimal for signed 16-bits?
the hexadecimal numbers would look like 0xFFF6+0xFFF6
I've heard it should equal 0xFFEC which should be -20. Anyone? Pretty please?
Addition
When adding the two numbers, use the usual method of adding the digits by place value.
0xFFF6 (-10) 0xFFF 6 (6)
+ 0xFFF6 (-10) >> - 0xFFF 6 (6)
----------------- ------------ ------
C (12)
Carry when needed.
1 <-- Carried
0x F F F 6 (15) 0x F F F 6
- 0x F F F 6 (15) >> - 0x F F F 6
--------------- ------ --------------
1E C (30) E C
^ +-- need to carry 16
|
Carry this to next place value
Continue until all digits are accounted for. Discard overflow carry. Overflow is checked using sign.
1 1 1
0x F F F 6 0x F F F 6 0xFFF6 (-10)
- 0x F F F 6 >> - 0x F F F 6 >> - 0xFFF6 (-10)
-------------- -------------- -----------------
1F E C 1F F E C 0xFFEC (-20)
^
|
Discard
Subtraction
Adding a negative is the same as subtracting the positive. Turn -10 + -10 into -10 - 10 by taking the 2's complement of the subtrahend.
0xFFF6 (-10) 0xFFF6 (-10)
+ 0xFFF6 (-10) >> 2's complement >> - 0x000A (+10)
----------------- -----------------
Next, use binary subtraction and borrow as needed.
Borrow Borrowed
| |
v v
0xFFF 6 ( 6) 0xFFE 16 ( 22)
- 0x000 A (-10) >> - 0x000 A (-10)
------------ ------- ------------- -------
Continue until all digits are accounted for.
0xFFE 16 ( 22) 0xFFE 16 0xFFF6
- 0x000 A (-10) >> - 0x000 A >> - 0x000A
------------- ------- ------------- ----------
C ( 12) 0xFFE C 0xFFEC
Overflow
Once finished, check for overflow. The sign will be incorrect if overflow occurred (subtracting from a negative has to be negative).
0xFFEC -> negative (no overflow)
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
The challenge
The shortest code by character count, that will output musical notation based on user input.
Input will be composed of a series of letters and numbers - letters will represent the name of the note and the number will represent the length of the note. A note is made of 4 vertical columns. The note's head will be a capital O, stem, if present will be 3 lines tall, made from the pipe character |, and the flag(s) will be made from backward slash \.
Valid note lengths are none, 1/4 of a note, 1/8 of a note, 1/16 of a note and 1/32 of a note.
| |\ |\ |\
| | |\ |\
| | | |\
O O O O O
1 1/4 1/8 1/16 1/32
Notes are places on the Staff, according to their note name:
----
D ----
C
B ----
A
G ----
F
E ----
All input can be assumed to be valid and without errors - Each note separated with a white space on a single line, with at least one valid note.
Test cases
Input:
B B/4 B/8 B/16 B/32 G/4 D/8 C/16 D B/16
Output:
|\
--------------------------|---|\--------
| |\ |\ |\ | |\ |\
------|---|---|\--|\-----O----|--O----|\
| | | |\ | O |
-O---O---O---O---O----|--------------O--
|
---------------------O------------------
----------------------------------------
Input:
E/4 F/8 G/16 A/32 E/4 F/8 G/16 A/32
Output:
--------------------------------
--------------|\--------------|\
|\ |\ |\ |\
------|\--|\--|\------|\--|\--|\
| | | O | | | O
--|---|--O--------|---|--O------
| O | O
-O---------------O--------------
Input:
C E/32 B/8 A/4 B F/32 B C/16
Output:
------------------------------|\
|\ |\
----------|---|---------------|-
O | | O
---------O----|--O----|\-O------
|\ O |\
------|\--------------|\--------
|\ O
-----O--------------------------
Code count includes input/output (i.e full program).
Golfscript (112 characters)
' '%:A;10,{):y;A{2/.0~|1=~:r;0=0=5\- 7%
4y#--:q' '' O'if-4q&!q*r*{16q/r<'|\\'
'| 'if}' 'if+{.32=y~&{;45}*}%}%n}%
Perl, 126 characters (115/122 with switches)
Perl in 239 226 218 216 183 180 178 172 157 142 136 133 129 128 126 chars
This 126 character solution in Perl is the result of a lengthy collaboration between myself and A. Rex.
#o=($/)x10;$/=$";map{m[/];$p=4+(5-ord)%7;
$_.=--$p?!($p&~3)*$'?16<$p*$'?" |\\":" | ":$/x4:" O ",
$|--&&y# #-#for#o}<>;print#o
A. Rex also proposes a solution to run with the perl -ap switch. With 111(!)
characters in this solution plus 4 strokes for the extra command-line switch,
this solution has a total score of 115.
$\="$:
"x5;$p=4+(5-ord)%7,s#..##,$\=~s#(.)\K$#--$p?
$_*!($p&~3)?"$1|".(16<$p*$_?"\\":$1).$1:$1x4:O.$1x3#gemfor#F
The first newline in this solution is significant.
Or 122 characters embedding the switches in the shebang line:
#!perl -ap
$\="$:
"x5;$p=4+(5-ord)%7,s#..##,$\=~s#(.)\K$#--$p?$_*!($p&~3)?"$1|".(16<$p*$_?
"\\":$1).$1:$1x4:O.$1x3#gemfor#F
(first two newlines are significant).
Half-notes can be supported with an additional 12 chars:
#o=($/)x10;$/=$";map{m[/];$p=4+(5-ord)%7;
$_.=--$p?!($p&~3)*$'?16<$p*$'?" |\\":" | ":$/x4:$'>2?" # ":" O ",
$|--&&y# #-#for#o}<>;print#o
LilyPond - 244 bytes
Technically speaking, this doesn't adhere to the output specification, as the output is a nicely engraved PDF rather than a poor ASCII text substitute, but I figured the problem was just crying out for a LilyPond solution. In fact, you can remove the "\autoBeamOff\cadenzaOn\stemUp" to make it look even more nicely formatted. You can also add "\midi{}" after the "\layout{}" to get a MIDI file to listen to.
o=#(open-file"o""w")p=#ly:string-substitute
#(format o"~(~a"(p"2'1""2"(p"4'1""4"(p"6'1""6"(p"8'1""8"(p"/""'"(p"C""c'"(p"D""d'"(p" ""/1"(p"
"" "(ly:gulp-file"M")))))))))))#(close-port o)\score{{\autoBeamOff\cadenzaOn\stemUp\include"o"}\layout{}}
Usage: lilypond thisfile.ly
Notes:
The input must be in a file named "M" in the same directory as the program.
The input file must end in a newline. (Or save 9 bytes by having it end in a space.)
The output is a PDF named "thisfile.pdf", where "thisfile.ly" is the name of the program.
I tested this with LilyPond 2.12.2; other versions might not work.
I haven't done much in LilyPond, so I'm not sure this is the best way to do this, since it has to convert the input to LilyPond format, write it to an auxiliary file, and then read it in. I currently can't get the built-in LilyPond parser/evaluator to work. :(
Now working on an ASCII-output solution.... :)
C89 (186 characters)
#define P,putchar(
N[99];*n=N;y;e=45;main(q){for(;scanf(" %c/%d",n,n+1)>0;n
+=2);for(;y<11;q=y-(75-*n++)%7 P+q-4?e:79)P*n&&q<4&q>0?
124:e)P*n++/4>>q&&q?92:e))*n||(e^=13,n=N,y++P+10))P+e);}
Half-note support (+7 characters)
#define P,putchar(
N[99];*n=N;y;e=45;main(q){for(;scanf(" %c/%d",n,n+1)>0;n
+=2);for(;y<11;q=y-(75-*n++)%7 P+q-4?e:v<4?79:64)P*n&&q<4&q>0?
124:e)P*n++/4>>q&&q?92:e))*n||(e^=13,n=N,y++P+10))P+e);}
Python 178 characters
The 167 was a false alarm, I forgot to suppress the stems on the whole notes.
R=raw_input().split()
for y in range(10):
r=""
for x in R:o=y-(5-ord(x[0]))%7;b=" -"[y&1]+"O\|";r+=b[0]+b[o==3]+b[-(-1<o<3and''<x[1:])]+b[2*(-1<o<":862".find(x[-1]))]
print r
Python 167 characters (Broken)
No room for the evil eye in this one, although there are 2 filler characters in there, so I added a smiley. This technique takes advantage of the uniqueness of the last character of the note lengths, so lucky for me that there are no 1/2 notes or 1/64 notes
R=raw_input().split()
for y in range(10):
r=""
for x in R:o=y-(5-ord(x[0]))%7;b=" -"[y&1]+"O\|";r+=b[0]+b[o==3]+b[-(-1<o<3)]+b[2*(-1<o<":862".find(x[-1]))]
print r
Python 186 characters <<o>>
Python uses the <<o>> evil eye operator to great effect here. The find() method returns -1 if the item is not found, so that is why D doesn't need to appear in the notes.
R=raw_input().split()
for y in range(10):
r=""
for x in R:o='CBAGFE'.find(x[0])+4;B=" -"[y%2];r+=B+(B,'O')[o==y]+(x[2:]and
y+4>o>y and"|"+(B,'\\')[int(x[2:])<<o>>6+y>0]or B*2)
print r
11 extra bytes gives a version with half notes
R=raw_input().split()
for y in range(10):
r=""
for x in R:t='CBAGFE'.find(x[0])+4;l=x[2:];B=" -"[y%2];r+=B+(B,'#O'[l
in'2'])[t==y]+(l and y+4>t>y and"|"+(B,'\\')[int(l)>>(6+y-t)>0]or B*2)
print r
$ echo B B/2 B/4 B/8 B/16 B/32 G/4 D/8 C/16 D B/16| python notes.py
|\
------------------------------|---|\--------
| | |\ |\ |\ | |\ |\
------|---|---|---|\--|\-----#----|--O----|\
| | | | |\ | # |
-O---O---#---#---#---#----|--------------#--
|
-------------------------#------------------
--------------------------------------------
159 Ruby chars
n=gets.split;9.downto(0){|p|m='- '[p%2,1];n.each{|t|r=(t[0]-62)%7;g=t[2..-1]
print m+(r==p ?'O'+m*2:p>=r&&g&&p<r+4?m+'|'+(g.to_i>1<<-p+r+5?'\\':m):m*3)}
puts}
Ruby 136
n=gets;10.times{|y|puts (b=' -'[y&1,1])+n.split.map{|t|r=y-(5-t[0])%7
(r==3?'O':b)+(t[1]&&0<=r&&r<3?'|'<<(r<t[2,2].to_i/8?92:b):b+b)}*b}
Ruby 139 (Tweet)
n=gets;10.times{|y|puts (b=' -'[y&1,1])+n.split.map{|t|r=y-(5-t[0])%7
(r==3?'O':b)+(t[1]&&0<=r&&r<3?'|'<<(r<141>>(t[-1]&7)&3?92:b):b+b)}*b}
Ruby 143
n=gets.split;10.times{|y|puts (b=' -'[y&1,1])+n.map{|t|r=y-(5-t[0])%7;m=t[-1]
(r==3?'O':b)+(m<65&&0<=r&&r<3?'|'<<(r<141>>(m&7)&3?92:b):b+b)}*b}
Ruby 148
Here is another way to calculate the flags,
where m=ord(last character), #flags=1+m&3-(1&m/4)
and another way #flags=141>>(m&7)&3, that saves one more byte
n=gets.split;10.times{|y|b=' -'[y&1,1];n.each{|t|r=y-(5-t[0])%7;m=t[-1]
print b+(r==3?'O':b)+(m<65&&0<=r&&r<3?'|'<<(r<141>>(m&7)&3?92:b):b+b)}
puts}
Ruby 181
First try is a transliteration of my Python solution
n=gets.split;10.times{|y|r="";n.each{|x|o=y-(5-x[0])%7
r+=(b=" -"[y&1,1]+"O\\|")[0,1]+b[o==3?1:0,1]+b[-1<o&&o<3&&x[-1]<64?3:0,1]+b[-1<o&&o<(":862".index(x[-1]).to_i)?2:0,1]}
puts r}
F#, 458 chars
Reasonably short, and still mostly readable:
let s=Array.init 10(fun _->new System.Text.StringBuilder())
System.Console.ReadLine().Split([|' '|])
|>Array.iter(fun n->
for i in 0..9 do s.[i].Append(if i%2=1 then"----"else" ")
let l=s.[0].Length
let i=68-int n.[0]+if n.[0]>'D'then 7 else 0
s.[i+3].[l-3]<-'O'
if n.Length>1 then
for j in i..i+2 do s.[j].[l-2]<-'|'
for j in i..i-1+(match n.[2]with|'4'->0|'8'->1|'1'->2|_->3)do s.[j].[l-1]<-'\\')
for x in s do printfn"%s"(x.ToString())
With brief commentary:
// create 10 stringbuilders that represent each line of output
let s=Array.init 10(fun _->new System.Text.StringBuilder())
System.Console.ReadLine().Split([|' '|])
// for each note on the input line
|>Array.iter(fun n->
// write the staff
for i in 0..9 do s.[i].Append(if i%2=1 then"----"else" ")
// write note (math so that 'i+3' is which stringbuilder should hold the 'O')
let l=s.[0].Length
let i=68-int n.[0]+if n.[0]>'D'then 7 else 0
s.[i+3].[l-3]<-'O'
// if partial note
if n.Length>1 then
// write the bar
for j in i..i+2 do s.[j].[l-2]<-'|'
// write the tails if necessary
for j in i..i-1+(match n.[2]with|'4'->0|'8'->1|'1'->2|_->3)do s.[j].[l-1]<-'\\')
// print output
for x in s do printfn"%s"(x.ToString())
C 196 characters <<o>>
Borrowing a few ideas off strager. Interesting features include the n+++1 "triple +" operator and the <<o>> "evil eye" operator
#define P,putchar
N[99];*n=N;y;b;main(o){for(;scanf(" %c/%d",n,n+1)>0;n+=2);for(;y<11;)
n=*n?n:(y++P(10),N)P(b=y&1?32:45)P((o=10-(*n+++1)%7-y)?b:79)P(0<o&o<4&&*n?'|':b)
P(*n++<<o>>6&&0<o&o<4?92:b);}
168 characters in Perl 5.10
My original solution was 276 characters, but lots and lots of tweaking reduced it by more than 100 characters!
$_=<>;
y#481E-GA-D62 #0-9#d;
s#.(/(.))?#$"x(7+$&).O.$"x($k=10).($1?"|":$")x3 .$"x(10-$2)."\\"x$2.$"x(9-$&)#ge;
s#(..)*?\K (.)#-$2#g;
print$/while--$k,s#.{$k}\K.#!print$&#ge
If you have a minor suggestion that improves this, please feel free to just edit my code.
Lua, 307 Characters
b,s,o="\\",io.read("*l"),io.write for i=1,10 do for n,l in
s:gmatch("(%a)/?(%d*)")do x=n:byte() w=(x<69 and 72 or 79)-x
l=tonumber(l)or 1 d=i%2>0 and" "or"-"o(d..(i==w and"O"or
d)..(l>3 and i<w and i+4>w and"|"or d)..(l>7 and i==w-3
and b or l>15 and i==w-2 and b or l>31 and i==w-1 and b or
d))end o"\n"end
C -- 293 characters
Still needs more compression, and it takes the args on the command line instead of reading them...
i,j,k,l;main(c,v)char **v;{char*t;l=4*(c-1)+2;t=malloc(10*l)+1;for(i=0;i<10;i
++){t[i*l-1]='\n';for(j=0;j<l;j++)t[i*l+j]=i&1?'-':' ';}t[10*l-1]=0;i=1;while
(--c){j='G'-**++v;if(j<3)j+=7;t[j*l+i++]='O';if(*++*v){t[--j*l+i]='|';t[--j*l
+i]='|';t[--j*l+i]='|';if(*++*v!='4'){t[j++*l+i+1]='\\';if(**v!='8'){t[j++*l+
i+1]='\\';if(**v!='1'){t[j++*l+i+1]='\\';}}}}i+=3;}puts(t);}
edit: fixed the E
edit: down to 293 characters, including the newlines...
#define X t[--j*l+i]='|'
#define Y t[j++*l+i+1]=92
i,j,k,l;main(c,v)char**v;{char*t;l=4*(c-1)+2;t=malloc(10*l)+1;for(i=10;i;)t[--i*
l-1]=10,memset(t+i*l,i&1?45:32,l-1);t[10*l-1]=0;for(i=1;--c;i+=3)j=71-**++v,j<3?
j+=7:0,t[j*l+i++]=79,*++*v?X,X,X,*++*v-52?Y,**v-56?Y,**v-49?Y:0:0:0:0;puts(t);}