Related
I trying to read an input from user and print it.
In the beginning, I print a request to the user, the user enter a value and I want to print it.
.data
params_sys5: .space 8
params_sys3: .space 8
prompt_msg_LBound: .asciiz "Enter lower bound for x,y\n"
prompt_msg_LBound_val: .asciiz "Lower bound for x,y = %d\n"
xyL: .word64 0
prompt_msg_UBound: .asciiz "Enter upper bound for x,y\n"
prompt_msg_UBound_val: .asciiz "Upper bound for x,y = %d\n"
xyU: .word64 0
prompt_msg_UBoundZ: .asciiz "Enter upper bound for z\n"
prompt_msg_UBoundZ_val: .asciiz "Lower bound for z = %d\n"
zU: .word64 0
prompt_msgAns: .asciiz "x = %d, y = %d, z = %d\n"
.word64 0
.word64 0
.word64 0
xyL_Len: .word64 0
xyU_Len: .word64 0
zU_Len: .word64 0
xyL_text: .space 32
xyU_text: .space 32
zU_text: .space 32
ZeroCode: .word64 0x30 ;Ascii '0'
.text
main: daddi r4, r0, prompt_msg_LBound
jal print_string
daddi r8, r0, xyL_text ;r8 = xyL_text
daddi r14, r0, params_sys3
daddi r9, r0, 32
jal read_keyboard_input
sd r1, xyL_Len(r0) ;save first number length
ld r10, xyL_Len(r0) ;n = r10 = length of xyL_text
daddi r17, r0, xyL_text
jal convert_string_to_integer ;r17 = &source string,r10 = string length,returns computed number in r11
sd r11, xyL(r0)
daddi r4, r0, prompt_msg_LBound_val
jal print_string
end: syscall 0
print_string: sw $a0, params_sys5(r0)
daddi r14, r0, params_sys5
syscall 5
jr r31
read_keyboard_input: sd r0, 0(r14) ;read from keyboard
sd r8, 8(r14) ;destination address
sd r9, 16(r14) ;destination size
syscall 3
jr r31
convert_string_to_integer: daddi r13, r0, 1 ;r13 = constant 1
daddi r20, r0, 10 ;r20 = constant 10
movz r11, r0, r0 ;x1 = r11 = 0
ld r19, ZeroCode(r0)
For1: beq r10, r0, EndFor1
dmultu r11, r20 ;lo = x * 10
mflo r11 ;x = r11 = lo = r11 * 10
movz r16, r0, r0 ;r16 = 0
lbu r16, 0(r17) ;r16 = text[i]
dsub r16, r16, r19 ;r16 = text[i] - '0'
dadd r11, r11, r16 ;x = x + text[i] - '0'
dsub r10, r10, r13 ;n--
dadd r17, r17, r13 ;i++
b For1
EndFor1: jr r31
I'm trying to get the first number, the lower bound of x,y.
For example, I type the number 5, so in the end the xyL representation is 5 but the printed string is:
Enter lower bound for x,y
Lower bound for x,y = 0
How do I print the entered value and after that do same with the next string?
Thanks.
Edit:=======================================================================
I changed the .data by adding another data type .space 8 to save the address and now instead of jumping to print_string to print the value, I call syscall 5, for example:
prompt_msg_LBound: .asciiz "Enter lower bound for x,y\n"
prompt_msg_LBound_val: .asciiz "Lower bound for x,y = %d\n"
LBound_val_addr: .space 8
xyL: .space 8
and in the .code section:
sd r11, xyL(r0)
daddi r5, r0, prompt_msg_LBound_val
sd r5, LBound_val_addr(r0)
daddi r14 ,r0, LBound_val_addr
syscall 5
But I still want to use the print_string to print the string:prompt_msg_LBound_val with the user entered value.
How can I do that?
The print_string sample function in the manual is not meant to be used with placeholders, just with plain strings.
If you add placeholders to the format string, then SYSCALL 5 will keep reading from memory the value of those placeholders. In this case, it just reads and display the value 0, which by accident is what's in memory.
See the printf() example from the manual (slightly updated and annotated) to check how to use placeholders:
.data
format_str: .asciiz "%dth of %s:\n%s version %i.%i.%i is being tested!"
s1: .asciiz "February"
s2: .asciiz "EduMIPS64"
fs_addr: .space 4 ; Will store the address of the format string
.word 10 ; The literal value 10.
s1_addr: .space 4 ; Will store the address of the string "February"
s2_addr: .space 4 ; Will store the address of the string "EduMIPS64"
.word 1 ; The literal value 1.
.word 2 ; The literal value 2.
.word 6 ; The literal value 6.
test:
.code
daddi r5, r0, format_str
sw r5, fs_addr(r0)
daddi r2, r0, s1
daddi r3, r0, s2
sd r2, s1_addr(r0)
sd r3, s2_addr(r0)
daddi r14, r0, fs_addr
syscall 5
syscall 0
I am writing a 8bit integer newton raphson function in MASM X8086-32bit assembly and I think I am stuck in an infinite loop. The editor I have to use for class does not send an error for infinite loops.
Anyways I am not sure where my problem is. I just started MASM a few weeks ago and am kind of lost any help with the infinite loop would be appreciated. My initial x value is defined as 1.
The function is y = 1/2(x+n/x) ===> x/2+ n/2x where n is the number in question. and x is the intitialized value and then the previous iterations y value.
mov ah, 09h
lea dx, prompt ;Prompt User
int 21h
mov ah, 01h
int 21h ;User input
sub al, 30h
mov bh, al
mov bl, x ;Loading for loop
mov al, x
iteration:
mul bl; al*bl = al
mov dl, al ; storing x^2
add al, 01h ; (x+1)^2
mul al
cmp bh, dl
jge doneCheck ; bh - dl ====> n- x^2 => 0
doneCheck:
cmp bh, al; bh-al = ? ====>n - (x+1)^2 == -int
jl done
mov al, 02h; loading 2 in ah
mul bl ; bl*al = 2(bl) = 2x = al
shr bl, 1 ; x/2 = bl
mov cl, al ; storing 2x in cl
mov ah, 0 ; clearing ah
mov ch, 0; clearing ch
mov al, bh ; moving n into ax for division prep
div cx ; ax/cl ====> n/2x ===> p =ah and q = al
add bl, ah ;so this is finally 1/2(x+(n/x)) === (x/2+n/2x) y the new value y is now stored in bl for next loop
mov al, bl ; for next loop
jmp iteration
done:
mov dl, bl; print square root
mov ah, 02h
int 21h
This:
shl bl, 1 ; x/2 = bl
shouldn't be?:
shr bl,1
-- Updated:
And about your question:
BH = Number to find sqrt. When x^2 == BH then x is the sqrt(n)
AL and BL = y value of the last iteration
and you do:
mul bl ; AL*BH => AX
cmp bh, al ; BH == AL? or with names: n == x^2 ?
Why the infinite loop?:
As you take the input with AH=01h+int 21h, you only read one char and you get the ascii code in AL.
Let's assume the user input number is "A", which is translated into the number 65. By no means, any integer will give you x^2 = 65, so that loop will loop forever.
I suggest you to use this condition as the loop break. The result will be an approximation (rounded to the lower number):
(n >= x^2) && (n < (x+1)^2)
Bear in mind that you are working all with 8 bits, so the highest solution would be: y = 15. Look at this:
1^2 = 1
2^2 = 4
3^2 = 9
4^2 = 16
5^2 = 25
6^2 = 36
7^2 = 49
8^2 = 64
...
15^2 = 225
Those are the only numbers you can calculate sqrt with your code (without my proposal).
So you can only press the following keys as input:
$ = number 36
1 = number 49
# = number 64
Q = number 81
d = number 100
y = number 121
Any keypress between those will make your code get into an infinite loop.
And a tip for output: add 48 to BL before printing it so it goes to an ASCII number :)
-- Update 2 :
From your code I found this errors:
add al, 01h ; (x+1)^2 ; AL = x^2, therefore you are doing (x^2)+1
mul al
and here the execution flow will execute all lines always:
cmp bh, dl
jge doneCheck ; bh >= dl? ====> n >= x^2 ?
doneCheck:
cmp bh, al; bh-al = ? ====>n - (x+1)^2 == -int
jl done
I guess it should be something like:
cmp bh, dl ; n vs x^2
jb notSolution ; BH < DL? ====> if n < x^2 continue with next NR step
cmp bh, al ; here, n >= x^2
jb done ; BH < AL ? ====> if n < (x+1)^2 we found a solution
notSolution: ; n is not in [ x^2 , (x+1)^2 )
I used jb instead of jl because I assume only possitive numbers. jl will treat 129 as a negative number and maybe we will be in trouble.
-- Update 3:
From Peter Cordes' answer, a detail I didn't notice (I read div cl):
div cx ; ax/cl ====> n/2x ===> p =ah and q = al. That would be correct if you'd used div cl
I'm not sure you've correctly understood that MUL and DIV have one operand each that's double the width of the other two.
Your comments on those lines are wrong:
mul bl; al*bl = al: no, AX = AL*BL.
div cx ; ax/cl ====> n/2x ===> p =ah and q = al. That would be correct if you'd used div cl, but DIV r/m16 takes DX:AX as a 32-bit dividend, and produces results in AX=quotient, DX=remainder.
Look up MUL and DIV in the manual.
I highly recommend single-stepping through your code in a debugger. And/or stopping it in a debugger after it gets into an infinite loop, and single step from there while watching registers.
The bottom of the x86 tag wiki has some tips on using GDB to debug asm. (e.g. use layout reg). Since you're using MASM, you might be using Visual Studio, which has a debugger built in.
It doesn't matter what debugger you use, but it's an essential tool for developing asm.
I have been writing this program in assimbly language that encrypts or decrypts a string of text. At the end it should be simply outputting the encoded message but instead I am just getting a massive number of random characters. Anyone have any idea whats going on here?
.ORIG x3000
;CLEAR REGISTERS
AGAIN AND R0, R0, 0 ;CLEAR R0
AND R1, R1, 0 ;CLEAR R1
AND R2, R2, 0 ;CLEAR R2
AND R3, R3, 0 ;CLEAR R3
AND R4, R4, 0 ;CLEAR R4
AND R5, R5, 0 ;CLEAR R5
AND R6, R6, 0 ;CLEAR R6
;ENCRYPT/DECRYPT PROMPT
LEA R0, PROMPT_E ;LOADS PROMPT_E INTO R0
PUTS ;PRINTS R0
GETC ;GETS INPUT
OUT ;ECHO TO SCREEN
STI R0, MEMX3100 ;X3100 <- R0
;KEY PROMPT
LEA R0, PROMPT_K ;LOADS PROMPT_E INTO R0
PUTS ;PRINTS R0
GETC ;GETS INPUT
OUT ;ECHO TO SCREEN
STI R0, CYPHERKEY ;X3101 <- R0
;MESSAGE PROMPT
LD R6, MEMX3102 ;R6 <- MEMX3102
LEA R0, PROMPT_M ;LOADS PROMPT_E INTO R0
PUTS ;PRINTS R0
LOOP1 GETC ;GETS INPUT
OUT ;ECHO TO SCREEN
ADD R1, R0, #-10 ;R1 <- R0-10
BRZ NEXT ;BRANCH NEXT IF ENTER
STR R0, R6, #0 ;X3102 <- R0
ADD R6, R6, #1 ;INCRIMENT COUT
LD R2, NUM21 ;R2 <- -12546
ADD R5, R6, R2 ;R5 - R2
STI R5, MEMX4000 ;MEMX4000 <- R5
LD R1, NUM20 ;R1 <- NUM20
ADD R1, R6, R1 ;CHECK FOR 20
BRN LOOP1 ;CREATES WHILE LOOP
;Function choose
NEXT LDI R6, MEMX3100 ;R6 <- X3100
LD R1, NUM68 ;R1 <- -68
ADD R1, R6, R1 ;CHECKS FOR D INPUT
BRZ DECRYPT
;ENCRYPT FUNCTION(DEFAULT)
LD R4, MEMX3102 ;R6 <- X3102
LOOP2 LDR R1, R4, #0 ;R1 <- MEM[R4+0]
LDI R5, ASCII ;R5 <- ASCII
ADD R1, R1, R5 ;STRIPS ASCII
AND R6, R1, #1 ;R6 <- R1 AND #1
BRZ LSBOI ;BRANCH IF LSB = 0
ADD R1, R1, #-1 ;R1 <- R1-1
BRNZP KEYLOAD ;BRANCH TO KEYLOAD
LSBOI ADD R1, R1, #1 ;R1 <- R1+1
KEYLOAD LDI R2, CYPHERKEY ;R2 <- CYPHERKEY
ADD R1, R1, R2 ;R1 <- R1+R2
STR R1, R4, #21 ;MEM[R4+21] <- R1
ADD R4, R4, #1 ;R4 <- R4 + 1
LD R5, MEMX4000 ;R5 <- COUNT
NOT R5, R5 ;NOT R5
ADD R5, R5, R4 ;CHECK FOR NEGATIVE
BRN LOOP2 ;LOOP
BRNZP NEXT2 ;BRANCH WHEN DONE
;DECRYPT FUNCTION DECRYPT LD R4, MEMX3102 ;R4 <- X3102 LOOP3 LDR R1, R4, #0 ;R1 <- MEM[R4+0] LDI R5, ASCII ;R5 <- ASCII ADD R1, R1, R5 ;STRIPS ASCII LDI R2, CYPHERKEY ;R2 <- CYPHERKEY NOT R2, R2 ;R2 <- NOT R2 ADD R1, R1, R2 ;R1 <- R1 - CYPHERKEY AND R6, R1,
#1 ;R6 <- R1 AND #1 BRZ LSBOI2 ;BRANCH IF LSB = 0 ADD R1, R1, #-1 ;R1 <- R1-1 BRNZP NEXTTASK1 ;BRANCH TO KEYLOAD LSBOI2 ADD R1, R1, #1 ;R1 <- R1+1 NEXTTASK1 STR R1, R4, #21 ;MEM[R4+21] <- R1 ADD R4, R4, #1 ;R4 <- R4 + 1 LD R5, MEMX4000 ;R5 <- COUNT NOT R5, R5 ;NOT R5 ADD R5, R5, R4 ;CHECK FOR NEGATIVE BRN LOOP3 ;LOOP
;OUTPUT NEXT2 LD R4, MEMX3102 ;R4 <- X3102 LOOP4 LDR R0, R4,
#21 ;R0 <- [R4+21] OUT ;PRINT R0 ADD R4, R4, #1 ;R4 <- R4+1 LD R5, MEMX4000 ;R5 <- COUNT NOT R5, R5 ;NOT R5 ADD R5, R5, R4 ;CHECK FOR NEGATIVE BRN LOOP4
HALT MEMX4000 .FILL X4000 ASCII .FILL #-30 NUM21 .FILL #-12546 NUM20 .FILL #-12566 MEMX3102 .FILL X3102 CYPHERKEY .FILL X3101 MEMX3100 .FILL X3100 NUM68 .FILL #-68 NUM32 .FILL #-32 PROMPT_E .STRINGZ "\nTYPE E TO ENCRYPT OR TYPE D TO DECRYPT (UPPER CASE): " PROMPT_K .STRINGZ "\nENTER THE ENCRYPTION KEY (A SINGLE DIGIT FROM 1 TO 9) " PROMPT_M .STRINGZ "\nINPUT A MESSAGE OF NO MORE THAN 20 CHARACTERS THEN PRESS <ENTER> "
.END
There are a number of different things that are going on in your program, here are some of the things I've found:
Encoding loop loops more times than the number of characters entered
The encryption key is stored and used in its ASCII form
characters from the user are stored in the middle of the PROMPT_M text
encoding loop cycles for thousands of times
Encoding loop didn't change any of the stored characters at location x3102
Output routine doesn't loop, so it only outputs one char
From what I've seen your program takes a non ascii char from the user adds it to the ascii form of the encryption key and then stores that hundreds of times at every memory offset 21 locations from x3102. When your output routine runs it pulls the value stored at x3117 and outputs that one char, then halts the program.
Hopefully this is a simple question but I cannot for the life of me figure out how to do a bitshift in binary. This is being done in the LC3 environemnt. I just need to know how to arithmetical divide by two and shift to the right. I know going left is simple by just adding the binary value to itself, but I have tried the opposite for bitshift right(subtracting from itself, NOTing and then subtracting etc etc.) Would be much appreciated.
Or if you have a better way to move x00A0 to x000A that would also be fantastic. Thanks!
This is an older post, but I ran into the same issue so I figured I would post what I've found.
When you have to do a bit-shift to the right you're normally halving the the binary number (divide by 2) but that can be a challenge in the LC-3. This is the code I wrote to preform a bit-shift to the right.
; Bit shift to the right
.ORIG x3000
MAIN
LD R3, VALUE
AND R5, R5, #0 ; Reseting our bit counter
B_RIGHT_LOOP
ADD R3, R3, #-2 ; Subtract 2 from the value stored in R3
BRn BR_END ; Exit the loop as soon as the number in R3 has gone negative
ADD R5, R5, #1 ; Add 1 to the bit counter
BR B_RIGHT_LOOP ; Start the loop over again
BR_END
ST R5, ANSWER ; Store the shifted value into the ANSWER variable
HALT ; Stop the program
; Variables
VALUE .FILL x3BBC ; Value is the number we want to do a bit-shift to the right
ANSWER .FILL x0000
.END
Keep in mind that with this code the left most bit B[0] is lost. Also this code doesn't work if the number we are trying to shift to the right is negative. So if bit [15] is set this code won't work.
Example:
VALUE .FILL x8000 ; binary value = 1000 0000 0000 0000
; and values higher than x8000
; won't work because their 15th
; bit is set
This should at least get you going on the right track.
.ORIG x3000
BR main
;»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
; UL7AAjr
; shift right register R0
; used rigisters R1, R2, R3, R4, R5
;»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
shift_right
AND R4, R4, #0 ; R4 - counter = 15 times
ADD R4, R4, #15
AND R1, R1, #0 ; R1 - temp result
LEA R2, _sr_masks ; R2 - masks pointer
_sr_loop
LDR R3, R2, #0 ; load mask into R3
AND R5, R0, R3 ; check bit in R0
BRZ _sr_zero ; go sr_zero if bit is zero
LDR R3, R2, #1 ; R3 next mask index
ADD R1, R1, R3 ; add mask to temp result
_sr_zero
ADD R2, R2, #1 ; next mask address
ADD R4, R4, #-1 ; all bits done?
BRNP _sr_loop
AND R0, R0, #0 ; R0 = R1
ADD R0, R0, R1
RET
_sr_masks
.FILL x8000
.FILL x4000
.FILL x2000
.FILL x1000
.FILL x0800
.FILL x0400
.FILL x0200
.FILL x0100
.FILL x0080
.FILL x0040
.FILL x0020
.FILL x0010
.FILL x0008
.FILL x0004
.FILL x0002
.FILL x0001
;»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»»
main
LD R0, data
JSR shift_right
HALT
data .FILL xFFFF
.END
; right shift R0 1-bit with sign-extention
; Algorithm: build bit form msb one by one
.ORIG x3000
AND R1, R1, #0 ; r1 = 0
ADD R2, R1, #14 ; r2 = 14
ADD R0, R0, #0 ; r0 = r0
BRzp LOOP
ADD R1, R1, #-1 ; r1 = xffff
LOOP ADD R1, R1, R1 ; r1 << 1
ADD R0, R0, R0 ; r0 << 1
BRzp MSB0
ADD R1, R1, #1 ; r1++
MSB0 ADD R2, R2, #-1 ; cnt--
BRp LOOP
ADD R0, R1, #0 ; r0 = r1
HALT
.END
; right shift R0 1-bit with sign-extention
; Algorithm: left-rotate 14 times with proper sign
.ORIG x3000
LD R1, CNT
ADD R2, R0, #0
LOOP ADD R0, R0, R0 ; r0 << 1
BRzp NEXTBIT
ADD R0, R0, #1
NEXTBIT ADD R1, R1, #-1
BRp LOOP
LD R3, MASK
AND R0, R0, R3
ADD R2, R2, #0
BRzp DONE
NOT R3, R3
ADD R0, R0, R3
DONE HALT
MASK .FILL x3FFF
CNT .FILL 14
.END
; right shift R0 1-bit with sign-extention
; Algorithm: look-uo table and auto-stop
.ORIG x3000
AND R1, R1, #0 ; r1 = 0
LEA R2, TABLE ; r2 = table[]
AND R0, R0, #-2
LOOP BRzp MSB0
LDR R3, R2, #0 ; r3 = table[r2]
ADD R1, R1, R3 ; r1 += r3
MSB0 ADD R2, R2, #1 ; r2++
ADD R0, R0, R0 ; r0 << 1
BRnp LOOP
ADD R0, R1, #0 ; r0 = r1
HALT
TABLE
.FILL xC000
.FILL x2000
.FILL x1000
.FILL x0800
.FILL x0400
.FILL x0200
.FILL x0100
.FILL x0080
.FILL x0040
.FILL x0020
.FILL x0010
.FILL x0008
.FILL x0004
.FILL x0002
.FILL x0001
.END
I have a shader that looks like this:
void main( in float2 pos : TEXCOORD0,
in uniform sampler2D data : TEXUNIT0,
in uniform sampler2D palette : TEXUNIT1,
in uniform float c,
in uniform float th0,
in uniform float th1,
in uniform float th2,
in uniform float4 BackGroundColor,
out float4 color : COLOR
)
{
const float4 dataValue = tex2D( data, pos );
const float vValue = dataValue.x;
const float tValue = dataValue.y;
color = BackGroundColor;
if ( tValue <= th2 )
{
if ( tValue < th1 )
{
const float vRealValue = abs( vValue - 0.5 );
if ( vRealValue > th0 )
{
// determine value and color
const float power = ( c > 0.0 ) ? vValue : ( 1.0 - vValue );
color = tex2D( palette, float2( power, 0.0 ) );
}
}
else
{
color = float4( 0.0, tValue, 0.0, 1.0 );
}
}
}
and I am compiling it like this:
cgc -profile arbfp1 -strict -O3 -q sh.cg -o sh.asm
Now, different versions of Cg compiler creating different output.
cgc version 2.2.0006 is compiling the shader into an assembler code using 18 instructions:
!!ARBfp1.0
PARAM c[6] = { program.local[0..4],{ 0, 1, 0.5 } };
TEMP R0;
TEMP R1;
TEMP R2;
TEX R0.xy, fragment.texcoord[0], texture[0], 2D;
ADD R0.z, -R0.x, c[5].y;
CMP R0.z, -c[0].x, R0.x, R0;
MOV R0.w, c[5].x;
TEX R1, R0.zwzw, texture[1], 2D;
SLT R0.z, R0.y, c[2].x;
ADD R0.x, R0, -c[5].z;
ABS R0.w, R0.x;
SGE R0.x, c[3], R0.y;
MUL R2.x, R0, R0.z;
SLT R0.w, c[1].x, R0;
ABS R2.y, R0.z;
MUL R0.z, R2.x, R0.w;
CMP R0.w, -R2.y, c[5].x, c[5].y;
CMP R1, -R0.z, R1, c[4];
MUL R2.x, R0, R0.w;
MOV R0.xzw, c[5].xyxy;
CMP result.color, -R2.x, R0, R1;
END
# 18 instructions, 3 R-regs
cgc version 3.0.0016 is compiling the shader into an assembler code using 23 instructions:
!!ARBfp1.0
PARAM c[6] = { program.local[0..4], { 0, 1, 0.5 } };
TEMP R0;
TEMP R1;
TEMP R2;
TEX R0.xy, fragment.texcoord[0], texture[0], 2D;
ADD R1.y, R0.x, -c[5].z;
MOV R1.z, c[0].x;
ABS R1.y, R1;
SLT R1.z, c[5].x, R1;
SLT R1.x, R0.y, c[2];
SGE R0.z, c[3].x, R0.y;
MUL R0.w, R0.z, R1.x;
SLT R1.y, c[1].x, R1;
MUL R0.w, R0, R1.y;
ABS R1.z, R1;
CMP R1.y, -R1.z, c[5].x, c[5];
MUL R1.y, R0.w, R1;
ADD R1.z, -R0.x, c[5].y;
CMP R1.z, -R1.y, R1, R0.x;
ABS R0.x, R1;
CMP R0.x, -R0, c[5], c[5].y;
MOV R1.w, c[5].x;
TEX R1, R1.zwzw, texture[1], 2D;
CMP R1, -R0.w, R1, c[4];
MUL R2.x, R0.z, R0;
MOV R0.xzw, c[5].xyxy;
CMP result.color, -R2.x, R0, R1;
END
# 23 instructions, 3 R-regs
The strange thing is that the optimization level for the cg 3.0 doesn't seems to influence anything.
Can someone explain what is going on? Why is the optimization not working and why is the shader longer when I compiled with cg 3.0?
Take a note that I removed comments from the compiled shaders.
This might not be a real answer to the problem but maybe give some more insight. I inspected the generated assembly code a bit and converted it back to high-level code. I tried to compress it as much as possible and remove all copies and temporaries that follow implicitly from the high-level operations. I used b variables as temporary bools and fs as temporary floats. The first one (with the 2.2 version) is:
power = ( c > 0.0 ) ? vValue : ( 1.0 - vValue );
R1 = tex2D( palette, float2( power, 0.0 ) );
vRealValue = abs( vValue - 0.5 );
b1 = ( tValue < th1 );
b2 = ( tValue <= th2 );
b3 = b1;
b1 = b1 && b2 && ( vRealValue > th0 );
R1 = b1 ? R1 : BackGroundColor;
color = ( b2 && !b3 ) ? float4( 0.0, tValue, 0.0, 1.0 ) : R1;
and the second (with 3.0) is:
vRealValue = abs( vValue - 0.5 );
f0 = c;
b0 = ( 0 < f0 );
b1 = ( tValue < th1 );
b2 = ( tValue <= th2 );
b4 = b1 && b2 && ( vRealValue > th0 );
b0 = b0;
b3 = b1;
power = ( b4 && !b0 ) ? ( 1.0 - vValue ) : vValue;
R1 = tex2D( palette, float2( power, 0.0 ) );
R1 = b4 ? R1 : BackGroundColor;
color = ( b2 && !b3 ) ? float4( 0.0, tValue, 0.0, 1.0 ) : R1;
Most parts are essentially the same. The second program does some unneccessary operations. It copies the c variable into a temporary instead of using it directly. Moreover does it switch vValue and 1-vValue in the power computation, so it needs to negate b0 (resulting in one more CMP), whereas the first one does not use a temporary at all (it uses CMP directly instead of SLT and CMP). It also uses b4 in this computation, which is completely unneccessary, because when b4 is false, the result of the texture access is irrelevant, anyway. This results in one more && (implemented with MUL). There is also the unneccessary copy from b1 to b3 (in the first program it is neccessary, but not in the second). And the extremely useless copy from b0 into itself (which is disguised as an ABS, but as the value comes from an SLT, it can only be 0.0 or 1.0 and the ABS degenerates to a MOV).
So the second program is quite similar to the first one with just some additional, but IMHO completely useless instructions. The optimizer seems to have done a worse job compared to the previous(!) version. As the Cg compiler is an nVidia product (and not from some other not to be named graphics company) this behaviour is really strange.