I am curious to understand the divide by zero exception handling in linux. When divide by zero operation is performed, a trap is generated i.e. INT0 is sent to the processor and ultimately SIGFPE signal is sent to the process that performed the operation.
As I see, the divide by zero exception is registered in trap_init() function as
set_trap_gate(0, ÷_error);
I want to know in detail, what all happens in between the INT0 being generated and before the SIGFPE being sent to the process?
Trap handler is registered in the trap_init function from arch/x86/kernel/traps.c
void __init trap_init(void)
..
set_intr_gate(X86_TRAP_DE, ÷_error);
set_intr_gate writes the address of the handler function into idt_table x86/include/asm/desc.h.
How is the divide_error function defined? As a macro in traps.c
DO_ERROR_INFO(X86_TRAP_DE, SIGFPE, "divide error", divide_error, FPE_INTDIV,
regs->ip)
And the macro DO_ERROR_INFO is defined a bit above in the same traps.c:
193 #define DO_ERROR_INFO(trapnr, signr, str, name, sicode, siaddr) \
194 dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \
195 { \
196 siginfo_t info; \
197 enum ctx_state prev_state; \
198 \
199 info.si_signo = signr; \
200 info.si_errno = 0; \
201 info.si_code = sicode; \
202 info.si_addr = (void __user *)siaddr; \
203 prev_state = exception_enter(); \
204 if (notify_die(DIE_TRAP, str, regs, error_code, \
205 trapnr, signr) == NOTIFY_STOP) { \
206 exception_exit(prev_state); \
207 return; \
208 } \
209 conditional_sti(regs); \
210 do_trap(trapnr, signr, str, regs, error_code, &info); \
211 exception_exit(prev_state); \
212 }
(Actually it defines the do_divide_error function which is called by the small asm-coded stub "entry point" with prepared arguments. The macro is defined in entry_32.S as ENTRY(divide_error) and entry_64.S as macro zeroentry: 1303 zeroentry divide_error do_divide_error)
So, when a user divides by zero (and this operation reaches the retirement buffer in OoO), hardware generates a trap, sets %eip to divide_error stub, it sets up the frame and calls the C function do_divide_error. The function do_divide_error will create the siginfo_t struct describing the error (signo=SIGFPE, addr= address of failed instruction,etc), then it will try to inform all notifiers, registered with register_die_notifier (actually it is a hook, sometimes used by the in-kernel debugger "kgdb"; kprobe's kprobe_exceptions_notify - only for int3 or gpf; uprobe's arch_uprobe_exception_notify - again only int3, etc).
Because DIE_TRAP is usually not blocked by the notifier, the do_trap function will be called. It has a short code of do_trap:
139 static void __kprobes
140 do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
141 long error_code, siginfo_t *info)
142 {
143 struct task_struct *tsk = current;
...
157 tsk->thread.error_code = error_code;
158 tsk->thread.trap_nr = trapnr;
170
171 if (info)
172 force_sig_info(signr, info, tsk);
...
175 }
do_trap will send a signal to the current process with force_sig_info, which will "Force a signal that the process can't ignore".. If there is an active debugger for the process (our current process is ptrace-ed by gdb or strace), then send_signal will translate the signal SIGFPE to the current process from do_trap into SIGTRAP to debugger. If no debugger - the signal SIGFPE should kill our process while saving the core file, because that is the default action for SIGFPE (check man 7 signal in the section "Standard signals", search for SIGFPE in the table).
The process can't set SIGFPE to ignore it (I'm not sure here: 1), but it can define its own signal handler to handle the signal (example of handing SIGFPE another). This handler may just print %eip from siginfo, run backtrace() and die; or it even may try to recover the situation and return to the failed instruction. This may be useful for example in some JITs like qemu, java, or valgrind; or in high-level languages like java or ghc, which can turn SIGFPE into a language exception and programs in these languages can handle the exception (for example, spaghetti from openjdk is in hotspot/src/os/linux/vm/os_linux.cpp).
There is a list of SIGFPE handlers in debian via codesearch for siagaction SIGFPE or for signal SIGFPE
Related
I am faced with cuda invalid resource handle error when allocating buffer on gpu.
1, I download the code from git clone https://github.com/Funatiq/gossip.git.
2, I built this project in the gossip directory: git submodule update --init && make. Then I got the compile binary excute here.
3, then I generate a scatter and gather plan for my main GPU, here, it is 0.
$python3 scripts/plan_from_topology_asynch.py gather 0
$python3 scripts/plan_from_topology_asynch.py scatter 0
then it generates scatter_plan.json and gather_plan.json.
4, finally, I execute the plan:
./execute scatter_gather scatter_plan.json gather_plan.json
The error was pointing to these lines of code:
std::vector<size_t> bufs_lens_scatter = scatter.calcBufferLengths(table[main_gpu]);
print_buffer_sizes(bufs_lens_scatter);
std::vector<data_t *> bufs(num_gpus);
std::vector<size_t> bufs_lens(bufs_lens_scatter);
TIMERSTART(malloc_buffers)
for (gpu_id_t gpu = 0; gpu < num_gpus; ++gpu) {
cudaSetDevice(context.get_device_id(gpu)); CUERR
cudaMalloc(&bufs[gpu], sizeof(data_t)*bufs_lens[gpu]); CUERR
}
TIMERSTOP(malloc_buffers)
The detailed error is shown as:
RUN: scatter_gather
INFO: 32768 bytes (scatter_gather)
TIMING: 0.463872 ms (malloc_devices)
TIMING: 0.232448 ms (zero_gpu_buffers)
TIMING: 0.082944 ms (init_data)
TIMING: 0.637952 ms (multisplit)
Partition Table:
470 489 534 553 514 515 538 483
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Required buffer sizes:
0 538 717 604 0 344 0 687
TIMING: 3.94455e-31 ms (malloc_buffers)
CUDA error: invalid resource handle : executor.cuh, line 405
For reference, I attached the complete error report here. The curious part is that the author cannot reproduce these error on his server. But when I ran it on DGX workstation with 8 GPUs. This error occurs. I doubt if it is cuda programming error or environment specific issues.
The code has a defect in it, in the handling of cudaEventRecord() as used in the TIMERSTART and TIMERSTOP macros defined here and used here (with the malloc_buffers label).
CUDA events have a device association, impliclitly defined, when they are created. That means they are associated with the device selected by the most recent cudaSetDevice() call. As stated in the programming guide:
cudaEventRecord() will fail if the input event and input stream are associated to different devices.
(note that each device has its own null stream - these events are being recorded into the null stream)
And if we run the code with cuda-memcheck, we observe that the invalid resource handle error is indeed being returned by a call to cudaEventRecord().
Specifically referring to the code here:
...
std::vector<size_t> bufs_lens(bufs_lens_scatter);
TIMERSTART(malloc_buffers)
for (gpu_id_t gpu = 0; gpu < num_gpus; ++gpu) {
cudaSetDevice(context.get_device_id(gpu)); CUERR
cudaMalloc(&bufs[gpu], sizeof(data_t)*bufs_lens[gpu]); CUERR
}
TIMERSTOP(malloc_buffers)
The TIMERSTART macro defines and creates 2 cuda events, one of which it immediately records (the start event). The TIMERSTOP macro uses the timer stop event that was created in the TIMERSTART macro. However, we can see that the intervening code has likely changed the device from the one that was in effect when these two events were created (due to the cudaSetDevice call in the for-loop). Therefore, the cudaEventRecord (and cudaEventElapsedTime) calls are failing due to this invalid usage.
As a proof point, when I add cudaSetDevice calls to the macro definitions as follows:
#define TIMERSTART(label) \
cudaEvent_t timerstart##label, timerstop##label; \
float timerdelta##label; \
cudaSetDevice(0); \
cudaEventCreate(&timerstart##label); \
cudaEventCreate(&timerstop##label); \
cudaEventRecord(timerstart##label, 0);
#endif
#ifndef __CUDACC__
#define TIMERSTOP(label) \
stop##label = std::chrono::system_clock::now(); \
std::chrono::duration<double> \
timerdelta##label = timerstop##label-timerstart##label; \
std::cout << "# elapsed time ("<< #label <<"): " \
<< timerdelta##label.count() << "s" << std::endl;
#else
#define TIMERSTOP(label) \
cudaSetDevice(0); \
cudaEventRecord(timerstop##label, 0); \
cudaEventSynchronize(timerstop##label); \
cudaEventElapsedTime( \
&timerdelta##label, \
timerstart##label, \
timerstop##label); \
std::cout << \
"TIMING: " << \
timerdelta##label << " ms (" << \
#label << \
")" << std::endl;
#endif
The code runs without error for me. I'm not suggesting this is the correct fix. The correct fix may be to properly set the device before calling the macro. It seems evident that either the macro writer did not expect this kind of usage, or else was unaware of the hazard.
The only situation I could imagine where the error would not occur would be in a single-device system. When the code maintainer responded to your issue that they could not reproduce the issue, my guess is they have not tested the code on a multi-device system. As near as I can tell, the error would be unavoidable in a multi-device setup.
I'm trying to decompile the firmware of a Logitech Freedom 2.4 Cordless Joystick. I've managed to get something of the EEPROM. (here)
The EEPROM that is used is the Microchip 25AA320, which is a 32Kbit SPI-EEPROM. The MCU is a nRF24E1G , that contains a 8051 MCU.
The ROM should be 4096 bytes, so I think that my reading program looped over it self 4 times.
I managed to extract a 4kB ROM (here), but the start of the file doesn't look clean.
I loaded both files into IDA Pro and Ghidra and selected the 8051 processor. They don't generate anything useful.
Could anyone help me decompiling this ROM?
I used this Arduino Sketch to dump the rom.
Together with this python script
## Author: Arpan Das
## Date: Fri Jan 11 12:16:59 2019 +0530
## URL: https://github.com/Cyberster/SPI-Based-EEPROM-Reader-Writer
## It listens to serial port and writes contents into a file
## requires pySerial to be installed
import sys
import serial
import time
start = time.time()
MEMORY_SIZE = 4096 # In bytes
serial_port = 'COM5'
baud_rate = 115200 # In arduino, Serial.begin(baud_rate)
write_to_file_path = "dump.rom"
output_file = open(write_to_file_path, "wb")
ser = serial.Serial(serial_port, baud_rate)
print("Press d for dump ROM else CTRL+C to exit.")
ch = sys.stdin.read(1)
if ch == 'd':
ser.write('d')
for i in range(MEMORY_SIZE/32): # i.e. MEMORY_SIZE / 32
# wait until arduino response with 'W' i.e. 1 byte of data write request
while (ser.read() != 'W'): continue
ser.write('G') # sends back write request granted signal
for j in range(32):
byte = ser.read(1);
output_file.write(byte);
print(str(MEMORY_SIZE - (i * 32)) + " bytes remaining.")
print '\nIt took', time.time()-start, ' seconds.'
This is what I did, the next part left is for you. My machine is a Win10 notebook, however I used unix tools because they are so capable.
First of all, I divided the 16KB dump into four 4KB parts. The first one was different from the other three. And the provided 4KB dump is different to all of these parts. I did not investigate this further, and simply took one of the other three parts that are all equal.
$ split -b 4K LogitechFreedom2.4CordlessJoystick.rom part
$ cmp partaa partab
partaa partab differ: byte 1, line 1
$ cmp partab partac
$ cmp partac partad
$ cmp dump.rom partaa
dump.rom partaa differ: byte 9, line 1
$ cmp dump.rom partab
dump.rom partab differ: byte 1, line 1
From the microcontroller's data sheet I learned that the EEPROM contents has a header of at least 3 bytes (chapter 10.2 at page 61).
These bytes are:
0b Version = 00, Reserved = 00, SPEED = 0.5MHz, XO_FREQ = 16MHz
03 Offset to start of user program = 3
0f Number of 256 bytes block = 15
The last entry seems to be off by one, because there seems to be code in the 16th block, too.
Anyway, these bytes look decent, so I cut the first 3 bytes.
$ dd if=partad of=rom.bin bs=1 skip=3
4093+0 records in
4093+0 records out
4093 bytes (4,1 kB, 4,0 KiB) copied, 0,0270132 s, 152 kB/s
$ dd if=partad of=head.bin bs=1 count=3
3+0 records in
3+0 records out
3 bytes copied, 0,0043809 s, 0,7 kB/s
$ od -Ax -t x1 rom.bin > rom.hex
$ od -Ax -t x1 head.bin > head.hex
The hex files are nice for loading them into an editor and look around.
I loaded the remaining 4093 bytes into a disassembler I once wrote and peeked around a bit. It looks promising, so I think you can go on without me now:
C0000: ljmp C0F54
C0003: setb 021H.2
reti
C000B: xch a,r5
inc r6
xrl a,r6
mov a,#0B2H
movc a,#a+pc
movx #r1,a
mov r7,a
setb 021H.2
reti
C0F54: mov psw,#000H
mov sp,#07BH
mov r0,#0FFH
mov #r0,#000H
djnz r0,C0F5C
ljmp C0C09
I'm working on bare-metal. No Linux, libraries, etc. I'm writing processor boot code in ASM and jumping to my compiled C code.
My command line is:
% qemu-system-aarch64 \
-s -S \
-machine virt,secure=on,virtualization=on \
-cpu cortex-a53 \
-d int \
-m 512M \
-smp 4 \
-display none \
-nographic \
-semihosting \
-serial mon:stdio \
-kernel my_file.elf \
-device loader,addr=0x40004000,cpu-num=0 \
-device loader,addr=0x40004000,cpu-num=1 \
-device loader,addr=0x40004000,cpu-num=2 \
-device loader,addr=0x40004000,cpu-num=3 \
;
When I connect gcc at the beginning, I can see:
(gdb) info threads
Id Target Id Frame
* 1 Thread 1.1 (CPU#0 [running]) _start () at .../start.S:20
2 Thread 1.2 (CPU#1 [halted ]) _start () at .../start.S:20
3 Thread 1.3 (CPU#2 [halted ]) _start () at .../start.S:20
4 Thread 1.4 (CPU#3 [halted ]) _start () at .../start.S:20
I want those other three processors to start in the "running" state, not "halted". How?
Note that my DTS contains this section:
psci {
migrate = < 0xc4000005 >;
cpu_on = < 0xc4000003 >;
cpu_off = < 0x84000002 >;
cpu_suspend = < 0xc4000001 >;
method = "smc";
compatible = "arm,psci-0.2\0arm,psci";
};
However, I'm not sure what to do with that. Adding many different lines of this form, don't seem to help:
-device loader,addr=0xc4000003,data=0x80000000,data-len=4
I'm not sure if I'm on the right track with this ARM PSCI thing? ARM's specification seems to define the "interface", not the system "implementation". However, I don't see the PSCI as "real" registers mentioned in the "virt" documentation/source. There is no "SMC" device mentioned in the DTS.
How does QEMU decide whether an SMP processor is "running" or "halted" on start and how can I influence that?
Based on #Peter-Maydell's answer below, I need to do one of two things...
Switch "-kernel" to "-bios". I do this, but my code doesn't load as I expect. My *.elf file has several sections; some in FLASH and some in DDR (above 0x40000000). Maybe that's the problem?
Change my boot code to setup and issue the SMC instruction to make the ARM PSCI "CPU_ON" call that QEMU will recognize and powerup the other processors. Code like this runs but doesn't seem to "do" anything...
ldr w0, =0xc4000003 // CPU_ON code from the DTS file
mov x1, 1 // CPU #1 in cluster zero (format of MPIDR register?)
ldr x2, _boot // Jump address 0x40006000 (FYI)
mov x3, 1 // context ID (meaningful only to caller)
smc #0 // GO!
// result is in x0 -> PSCI_RET_INVALID_PARAMS
Using the response provided by Peter Maydell, I am providing here a Minimal, Reproducible Example for people who may be interested.
Downloading/installing aarch64-elf toolchain:
wget "https://developer.arm.com/-/media/Files/downloads/gnu-a/8.3-2019.03/binrel/gcc-arm-8.3-2019.03-x86_64-aarch64-elf.tar.xz?revision=d678fd94-0ac4-485a-8054-1fbc60622a89&la=en"
mkdir -p /opt/arm
tar Jxf gcc-arm-8.3-2019.03-x86_64-aarch64-elf.tar.xz -C /opt/arm
Example files:
loop.s:
.title "loop.s"
.arch armv8-a
.text
.global Reset_Handler
Reset_Handler: mrs x0, mpidr_el1
and x0,x0, 0b11
cmp x0, #0
b.eq Core0
cmp x0, #1
b.eq Core1
cmp x0, #2
b.eq Core2
cmp x0, #3
b.eq Core3
Error: b .
Core0: b .
Core1: b .
Core2: b .
Core3: b .
.end
build.sh:
#!/bin/bash
set -e
CROSS_COMPILE=/opt/arm/gcc-arm-8.3-2019.03-x86_64-aarch64-elf/bin/aarch64-elf-
AS=${CROSS_COMPILE}as
LD=${CROSS_COMPILE}ld
OBJCOPY=${CROSS_COMPILE}objcopy
OBJDUMP=${CROSS_COMPILE}objdump
${AS} -g -o loop.o loop.s
${LD} -g -gc-sections -g -e Reset_Handler -Ttext-segment=0x40004000 -Map=loop.map -o loop.elf loop.o
${OBJDUMP} -d loop.elf
qemu.sh:
#!/bin/bash
set -e
QEMU_SYSTEM_AARCH64=qemu-system-aarch64
${QEMU_SYSTEM_AARCH64} \
-s -S \
-machine virt,secure=on,virtualization=on \
-cpu cortex-a53 \
-d int \
-m 512M \
-smp 4 \
-display none \
-nographic \
-semihosting \
-serial mon:stdio \
-bios loop.elf \
-device loader,addr=0x40004000,cpu-num=0 \
-device loader,addr=0x40004000,cpu-num=1 \
-device loader,addr=0x40004000,cpu-num=2 \
-device loader,addr=0x40004000,cpu-num=3 \
;
loop.gdb:
target remote localhost:1234
file loop.elf
load loop.elf
disassemble Reset_Handler
info threads
continue
debug.sh:
#!/bin/bash
CROSS_COMPILE=/opt/arm/gcc-arm-8.3-2019.03-x86_64-aarch64-elf/bin/aarch64-elf-
GDB=${CROSS_COMPILE}gdb
${GDB} --command=loop.gdb
Executing the program - two consoles will be needed.
First console:
./build.sh
Output should look like:
/opt/arm/gcc-arm-8.3-2019.03-x86_64-aarch64-elf/bin/aarch64-elf-ld: warning: address of `text-segment' isn't multiple of maximum page size
loop.elf: file format elf64-littleaarch64
Disassembly of section .text:
0000000040004000 <Reset_Handler>:
40004000: d53800a0 mrs x0, mpidr_el1
40004004: 92400400 and x0, x0, #0x3
40004008: f100001f cmp x0, #0x0
4000400c: 54000100 b.eq 4000402c <Core0> // b.none
40004010: f100041f cmp x0, #0x1
40004014: 540000e0 b.eq 40004030 <Core1> // b.none
40004018: f100081f cmp x0, #0x2
4000401c: 540000c0 b.eq 40004034 <Core2> // b.none
40004020: f1000c1f cmp x0, #0x3
40004024: 540000a0 b.eq 40004038 <Core3> // b.none
0000000040004028 <Error>:
40004028: 14000000 b 40004028 <Error>
000000004000402c <Core0>:
4000402c: 14000000 b 4000402c <Core0>
0000000040004030 <Core1>:
40004030: 14000000 b 40004030 <Core1>
0000000040004034 <Core2>:
40004034: 14000000 b 40004034 <Core2>
0000000040004038 <Core3>:
40004038: 14000000 b 40004038 <Core3>
Then:
./qemu.sh
Second console:
./debug.sh
Output should look like:
GNU gdb (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 8.2.1.20190227-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.linaro.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
warning: No executable has been specified and target does not support
determining executable automatically. Try using the "file" command.
0x0000000040004000 in ?? ()
Loading section .text, size 0x3c lma 0x40004000
Start address 0x40004000, load size 60
Transfer rate: 480 bits in <1 sec, 60 bytes/write.
Dump of assembler code for function Reset_Handler:
=> 0x0000000040004000 <+0>: mrs x0, mpidr_el1
0x0000000040004004 <+4>: and x0, x0, #0x3
0x0000000040004008 <+8>: cmp x0, #0x0
0x000000004000400c <+12>: b.eq 0x4000402c <Core0> // b.none
0x0000000040004010 <+16>: cmp x0, #0x1
0x0000000040004014 <+20>: b.eq 0x40004030 <Core1> // b.none
0x0000000040004018 <+24>: cmp x0, #0x2
0x000000004000401c <+28>: b.eq 0x40004034 <Core2> // b.none
0x0000000040004020 <+32>: cmp x0, #0x3
0x0000000040004024 <+36>: b.eq 0x40004038 <Core3> // b.none
End of assembler dump.
Id Target Id Frame
* 1 Thread 1.1 (CPU#0 [running]) Reset_Handler () at loop.s:5
2 Thread 1.2 (CPU#1 [running]) Reset_Handler () at loop.s:5
3 Thread 1.3 (CPU#2 [running]) Reset_Handler () at loop.s:5
4 Thread 1.4 (CPU#3 [running]) Reset_Handler () at loop.s:5
All four cores are stopped at address 0x40004000/Reset_Handler, and were started by the continue command in loop.gdb.
Press CTRL+C in the second console:
^C
Thread 1 received signal SIGINT, Interrupt.
Core0 () at loop.s:16
16 Core0: b .
(gdb)
Core #0 was executing code at Core0 label.
Enter the following command (still in the second console):
(gdb) info threads
Id Target Id Frame
* 1 Thread 1.1 (CPU#0 [running]) Core0 () at loop.s:16
2 Thread 1.2 (CPU#1 [running]) Core1 () at loop.s:17
3 Thread 1.3 (CPU#2 [running]) Core2 () at loop.s:18
4 Thread 1.4 (CPU#3 [running]) Core3 () at loop.s:19
(gdb)
Cores #1,#2 and #3 were executing the code at the respective Core1, Core2, Core3 labels prior to be stopped by the CTRL+C.
Description of the MPIDR_EL1 register is available here: the two lasts bits of MPIDR_EL1.Aff0 were used by all four cores for determining their respective core numbers.
This depends on the board model -- generally we follow what the hardware does, and some boards start all CPUs from power-on, and some don't. For the 'virt' board (which is specific to QEMU) what we generally do is use PSCI, which is the Arm standard firmware interface for powering SMP CPUs up and down (among other things; you can also use it for 'power down entire machine', for instance). On startup only the primary CPU is running, and it's the job of guest code to use the PSCI API to start the secondaries. That's what that psci node in the DTS is telling the guest -- it tells the guest what specific form of the PSCI ABI QEMU implements, and in particular whether the guest should use the 'hvc' or 'smc' instruction to call PSCI functions. What QEMU is doing here is emulating a "hardware + firmware" combination -- the guest executes an 'smc' instruction and QEMU performs the actions that on real hardware would be performed by a bit of firmware code running at EL3.
The virt board does also have another mode of operation which is intended for when you want to run a guest which is itself EL3 firmware (for instance if you want to run OVMF/UEFI at EL3). If you start QEMU with -machine secure=true to enable EL3 emulation and you also provide a guest firmware blob via either -bios or -drive if=pflash,..., then QEMU will assume your firmware wants to run at EL3 and provide PSCI services itself, so it will start with all CPUs powered on and let the firmware deal with sorting them out.
A simple example of making a PSCI call to turn on another CPU (in this case cpu #4 of 8):
.equ PSCI_0_2_FN64_CPU_ON, 0xc4000003
ldr x0, =PSCI_0_2_FN64_CPU_ON
ldr x1, =4 /* target CPU's MPIDR affinity */
ldr x2, =0x10000 /* entry point */
ldr x3, =0 /* context ID: put into target CPU's x0 */
smc 0
I have a doubt on the following code in TCL 8.6.8 source tclInt.h:
4277 #define TclInvalidateStringRep(objPtr) \
4278 if (objPtr->bytes != NULL) { \
4279 if (objPtr->bytes != tclEmptyStringRep) { \
4280 ckfree((char *) objPtr->bytes); \
4281 } \
4282 objPtr->bytes = NULL; \
4283 }
This macro is called by Tcl_InvalidateStringRep() in tclObj.c.
My doubt is, why doesn't tclObj's length get reset to zero?
Here is from definition of Tcl_Obj:
808 typedef struct Tcl_Obj {
809 int refCount; /* When 0 the object will be freed. */
810 char *bytes; /* This points to the first byte of the
811 * object's string representation. The array
812 * must be followed by a null byte (i.e., at
813 * offset length) but may also contain
814 * embedded null characters. The array's
815 * storage is allocated by ckalloc. NULL means
816 * the string rep is invalid and must be
817 * regenerated from the internal rep. Clients
818 * should use Tcl_GetStringFromObj or
819 * Tcl_GetString to get a pointer to the byte
820 * array as a readonly value. */
821 int length; /* The number of bytes at *bytes, not
822 * including the terminating null. */
So you can see length is tightly coupled with bytes, when bytes is cleared, shouldn't we reset length?
My doubt comes from the following code, TclCreateLiteral() in tclLiteral.c:
200 for (globalPtr=globalTablePtr->buckets[globalHash] ; globalPtr!=NULL;
201 globalPtr = globalPtr->nextPtr) {
202 objPtr = globalPtr->objPtr;
203 if ((globalPtr->nsPtr == nsPtr)
204 && (objPtr->length == length) && ((length == 0)
205 || ((objPtr->bytes[0] == bytes[0])
206 && (memcmp(objPtr->bytes, bytes, (unsigned) length) == 0)))) {
So at line 204, when length is not zero while bytes is NULL, the program crashes.
My product includes TCL source and I find the above problem when I trace a program crash. I put the workaround in our code, but like to confirm with the community if it indeed is a vulnerability.
Your aproach seems to be wrong somewhere.
The call of TclInvalidateStringRep is basically allowed for objects with no references (refCount == 0) or with exactly one reference (so refCount <= 1) and then only if you are sure, that this 1 reference is your own reference only.
Tcl's shared objects could switch its internal representation, but the string representation remains immutable. Otherwise you will break the basic principles of Tcl (like EIAS, etc).
Simplest example that can explain this:
set k 0x7f
dict set d $k test
expr {$k}; # ==> 127 (obj is integer now, but...)
puts $k; # ==> 0x7f (... still remains the string-representation)
puts [dict get $d $k]; # ==> test
# some code that fouls it up (despite of two references var `k` and key in dict `d`):
magic_happens_here $k; # string representation gets lost.
# and hereafter:
puts $k; # ==> 127 (representation is now 127, so...)
puts [dict get $d $k]; # ==> ERROR: key "127" not known in dictionary
As you can see, reset resp. altering of the string representation of shared object is wrong by design.
Please avoid this in Tcl.
I've had a think about this, and while I believe that the code that is purging the representation is wrong to do so (since the object should in principle be shared and so shouldn't be observed to change) I certainly think that it is extremely difficult to actually prove that that can't happen. For sure, TclCreateLiteral in tclLiteral.c shouldn't blow up if it happens!
The fix I'm using is to make TclCreateLiteral use TclGetStringFromObj (the Tcl-internal macro-ized version of Tcl_GetStringFromObj) to get the bytes and length fields instead of using them directly, so that the correct constraints are preserved. This should make the string representation exist once more if it is removed. If the code continues to crash, the problem is your code that is calling TclInvalidateStringRep on a literal (and setting a type that can't have a string generated for it; Tcl has some of those, but that's because it never purges the original string from them).
Remember, a Tcl_Obj should only have its string rep purged when it becomes wrong, not just when it gains a non-string representation. The fact a value has been interpreted as an integer doesn't mean that it shouldn't be interpretable as a list (quite the reverse!) and if the internal representation is never updated to a different value (in-place modifications should only ever happen to unshared objects) it should never need to lose that string representation at all.
Platform is Windows 7 SP1.
I recently spent some time debugging an issue that was caused because a code was passing an invalid parameter to one of the "safe" CRT functions. As a result my application was aborted right away with no warning or anything -- not even a crash dialog.
At first, I tried to figure this out by attaching Windbg to my application. However when the crash happened, by the time the code broke into Windbg pretty much every thread had been killed save for ONE thread on which Windbg had to break into. There was no clue as to what was wrong. So, I attached Visual Studio as a debugger instead and when my application terminated, I saw every thread exiting with error code 0xc0000417. That is what gave me the clue that there is an invalid parameter issue somewhere.
Next, the way I went about trying to debug this is to once again attach Windbg to my application but this time randomly (by trial & error) place breakpoints in various places like kernel32!TerminateThread, kernel32!UnhandledExceptionFilter and kernel32!SetUnhandledExceptionFilter.
Of the lot, placing a break point at SetUnhandledExceptionFilter immediately showed the callstack of the offending thread when the crash occurred and the CRT function that we were calling incorrectly.
Question: Is there anything intuitive that should have told me to place bp on SUEF right away? I would like to understand this a bit better and not do this by trial and error. Second question is w.r.t to the error code I determined via Visual Studio. Without resorting to VS, how do I determine thread exit codes on Windbg?
i was going to just comment but this became bigger so an answer
setting windbg as postmortem debugger using Windbg -I will also route all the unhandled exception to windbg
Windbg -I should Register windbg as postmortem debugger
by default Auto is set to 1 in AeDebug Registry Key
if you don't want to debug every program you can edit this to 0
to provide you an additional DoYouWanttoDebug option in the wer Dialog
reg query "hklm\software\microsoft\windows nt\currentversion\aedebug"
HKEY_LOCAL_MACHINE\software\microsoft\windows nt\currentversion\aedebug
Debugger REG_SZ "xxxxxxxxxx\windbg.exe" -p %ld -e %ld -g
Auto REG_SZ 0
assuming you registered a postmortem debugger and you run this code
#include <stdio.h>
#include <stdlib.h>
int main (void)
{
unsigned long input[] = {1,45,0xf001,0xffffffff};
int i = 0;
char buf[5] = {0};
for(i=0;i<_countof(input);i++)
{
_ultoa_s(input[i],buf,sizeof(buf),16);
printf("%s\n",buf);
}
return 1;
}
on the exception you will see a wer dialog like this
you can now choose to debug this program
windows also writes the exit code on unhandled exception to event log
you can use powershell to retrieve one event like this
PS C:\> Get-EventLog -LogName Application -Source "Application Error" -newest 1| format-list
Index : 577102
EntryType : Error
InstanceId : 1000
Message : Faulting application name:
ultos.exe, version: 0.0.0.0, time stamp: 0x577680f1
Faulting module name: ultos.exe, version:
0.0.0.0, time stamp: 0x577680f1
Exception code: 0xc0000417
Fault offset: 0x000211c2
Faulting process id: 0x4a8
Faulting application start time: 0x01d1d3aaf61c8aaa
Faulting application path: E:\test\ulto\ultos.exe
Faulting module path: E:\test\ulto\ultos.exe
Report Id: 348d86fc-3f9e-11e6-ade2-005056c00008
Category : Application Crashing Events
CategoryNumber : 100
ReplacementStrings : {ultos.exe, 0.0.0.0, 577680f1, ultos.exe...}
Source : Application Error
TimeGenerated : 7/1/2016 8:42:21 PM
TimeWritten : 7/1/2016 8:42:21 PM
UserName :
and if you choose to debug
you can view the CallStack
0:000> kPL
# ChildEBP RetAddr
00 001ffdc8 77cf68d4 ntdll!KiFastSystemCallRet
01 001ffdcc 75e91fdb ntdll!NtTerminateProcess+0xc
02 001ffddc 012911d3 KERNELBASE!TerminateProcess+0x2c
03 001ffdec 01291174 ultos!_invoke_watson(
wchar_t * expression = 0x00000000 "",
wchar_t * function_name = 0x00000000 "",
wchar_t * file_name = 0x00000000 "",
unsigned int line_number = 0,
unsigned int reserved = 0)+0x31
04 001ffe10 01291181 ultos!_invalid_parameter(
wchar_t * expression = <Value unavailable error>,
wchar_t * function_name = <Value unavailable error>,
wchar_t * file_name = <Value unavailable error>,
unsigned int line_number = <Value unavailable error>,
unsigned int reserved = <Value unavailable error>)+0x7a
05 001ffe28 0128ad96 ultos!_invalid_parameter_noinfo(void)+0xc
06 001ffe3c 0128affa ultos!common_xtox<unsigned long,char>(
unsigned long original_value = 0xffffffff,
char * buffer = 0x001ffea4 "",
unsigned int buffer_count = 5,
unsigned int radix = 0x10,
bool is_negative = false)+0x58
07 001ffe5c 0128b496 ultos!common_xtox_s<unsigned long,char>(
unsigned long value = 0xffffffff,
char * buffer = 0x001ffea4 "",
unsigned int buffer_count = 5,
unsigned int radix = 0x10,
bool is_negative = false)+0x59
08 001ffe78 012712b2 ultos!_ultoa_s(
unsigned long value = 0xffffffff,
char * buffer = 0x001ffea4 "",
unsigned int buffer_count = 5,
int radix = 0n16)+0x18
09 001ffeac 0127151b ultos!main(void)+0x52
0a (Inline) -------- ultos!invoke_main+0x1d
0b 001ffef8 76403c45 ultos!__scrt_common_main_seh(void)+0xff
0c 001fff04 77d137f5 kernel32!BaseThreadInitThunk+0xe
0d 001fff44 77d137c8 ntdll!__RtlUserThreadStart+0x70
0e 001fff5c 00000000 ntdll!_RtlUserThreadStart+0x1b