How Do I resolve "avrdude: stk500v2_command(): command failed avrdude: initialization failed, rc=-1" - arduino-ide

I use an ATTiny85 and when I want to flash it with an AVRusb500-smd4 and Arduino IDE. I get this error message:
avrdude: Version 6.3-20201216
Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/
Copyright (c) 2007-2014 Joerg Wunsch
System wide configuration file is "C:\Users\CPRUVOST\Documents\ArduinoData\packages\ATTinyCore\hardware\avr\1.5.2/avrdude.conf"
Using Port : COM5
Using Programmer : stk500
Setting bit clk period : 5.0
AVR Part : ATtiny85
Chip Erase delay : 400000 us
PAGEL : P00
BS2 : P00
RESET disposition : possible i/o
RETRY pulse : SCK
serial program mode : yes
parallel program mode : yes
Timeout : 200
StabDelay : 100
CmdexeDelay : 25
SyncLoops : 32
ByteDelay : 0
PollIndex : 3
PollValue : 0x53
Memory Detail :
Block Poll Page Polled
Memory Type Mode Delay Size Indx Paged Size Size #Pages MinW MaxW ReadBack
----------- ---- ----- ----- ---- ------ ------ ---- ------ ----- ----- ---------
eeprom 65 12 4 0 no 512 4 0 4000 4500 0xff 0xff
flash 65 12 32 0 yes 8192 64 128 30000 30000 0xff 0xff
signature 0 0 0 0 no 3 0 0 0 0 0x00 0x00
lock 0 0 0 0 no 1 0 0 9000 9000 0x00 0x00
lfuse 0 0 0 0 no 1 0 0 9000 9000 0x00 0x00
hfuse 0 0 0 0 no 1 0 0 9000 9000 0x00 0x00
efuse 0 0 0 0 no 1 0 0 9000 9000 0x00 0x00
calibration 0 0 0 0 no 2 0 0 0 0 0x00 0x00
Programmer Type : STK500V2
Description : Atmel STK500
Programmer Model: STK500
Hardware Version: 2
Firmware Version Master : 2.10
Topcard : Unknown
Vtarget : 5.0 V
SCK period : 8.7 us
Varef : 2.5 V
Oscillator : Off
avrdude: stk500v2_command(): command failed
avrdude: initialization failed, rc=-1
Double check connections and try again, or use -F to override
this check.
avrdude done. Thank you.`
I had no problems with my first attempts at flashing etc. And the card works with the latest flashing program. I wanted to make some modifications to improve the behaviour of my prototype. But this message is blocking me.
Do you have any ideas?

Related

CUDA invalid resource handle error when allocating gpu memory buffer

I am faced with cuda invalid resource handle error when allocating buffer on gpu.
1, I download the code from git clone https://github.com/Funatiq/gossip.git.
2, I built this project in the gossip directory: git submodule update --init && make. Then I got the compile binary excute here.
3, then I generate a scatter and gather plan for my main GPU, here, it is 0.
$python3 scripts/plan_from_topology_asynch.py gather 0
$python3 scripts/plan_from_topology_asynch.py scatter 0
then it generates scatter_plan.json and gather_plan.json.
4, finally, I execute the plan:
./execute scatter_gather scatter_plan.json gather_plan.json
The error was pointing to these lines of code:
std::vector<size_t> bufs_lens_scatter = scatter.calcBufferLengths(table[main_gpu]);
print_buffer_sizes(bufs_lens_scatter);
std::vector<data_t *> bufs(num_gpus);
std::vector<size_t> bufs_lens(bufs_lens_scatter);
TIMERSTART(malloc_buffers)
for (gpu_id_t gpu = 0; gpu < num_gpus; ++gpu) {
cudaSetDevice(context.get_device_id(gpu)); CUERR
cudaMalloc(&bufs[gpu], sizeof(data_t)*bufs_lens[gpu]); CUERR
}
TIMERSTOP(malloc_buffers)
The detailed error is shown as:
RUN: scatter_gather
INFO: 32768 bytes (scatter_gather)
TIMING: 0.463872 ms (malloc_devices)
TIMING: 0.232448 ms (zero_gpu_buffers)
TIMING: 0.082944 ms (init_data)
TIMING: 0.637952 ms (multisplit)
Partition Table:
470 489 534 553 514 515 538 483
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Required buffer sizes:
0 538 717 604 0 344 0 687
TIMING: 3.94455e-31 ms (malloc_buffers)
CUDA error: invalid resource handle : executor.cuh, line 405
For reference, I attached the complete error report here. The curious part is that the author cannot reproduce these error on his server. But when I ran it on DGX workstation with 8 GPUs. This error occurs. I doubt if it is cuda programming error or environment specific issues.
The code has a defect in it, in the handling of cudaEventRecord() as used in the TIMERSTART and TIMERSTOP macros defined here and used here (with the malloc_buffers label).
CUDA events have a device association, impliclitly defined, when they are created. That means they are associated with the device selected by the most recent cudaSetDevice() call. As stated in the programming guide:
cudaEventRecord() will fail if the input event and input stream are associated to different devices.
(note that each device has its own null stream - these events are being recorded into the null stream)
And if we run the code with cuda-memcheck, we observe that the invalid resource handle error is indeed being returned by a call to cudaEventRecord().
Specifically referring to the code here:
...
std::vector<size_t> bufs_lens(bufs_lens_scatter);
TIMERSTART(malloc_buffers)
for (gpu_id_t gpu = 0; gpu < num_gpus; ++gpu) {
cudaSetDevice(context.get_device_id(gpu)); CUERR
cudaMalloc(&bufs[gpu], sizeof(data_t)*bufs_lens[gpu]); CUERR
}
TIMERSTOP(malloc_buffers)
The TIMERSTART macro defines and creates 2 cuda events, one of which it immediately records (the start event). The TIMERSTOP macro uses the timer stop event that was created in the TIMERSTART macro. However, we can see that the intervening code has likely changed the device from the one that was in effect when these two events were created (due to the cudaSetDevice call in the for-loop). Therefore, the cudaEventRecord (and cudaEventElapsedTime) calls are failing due to this invalid usage.
As a proof point, when I add cudaSetDevice calls to the macro definitions as follows:
#define TIMERSTART(label) \
cudaEvent_t timerstart##label, timerstop##label; \
float timerdelta##label; \
cudaSetDevice(0); \
cudaEventCreate(&timerstart##label); \
cudaEventCreate(&timerstop##label); \
cudaEventRecord(timerstart##label, 0);
#endif
#ifndef __CUDACC__
#define TIMERSTOP(label) \
stop##label = std::chrono::system_clock::now(); \
std::chrono::duration<double> \
timerdelta##label = timerstop##label-timerstart##label; \
std::cout << "# elapsed time ("<< #label <<"): " \
<< timerdelta##label.count() << "s" << std::endl;
#else
#define TIMERSTOP(label) \
cudaSetDevice(0); \
cudaEventRecord(timerstop##label, 0); \
cudaEventSynchronize(timerstop##label); \
cudaEventElapsedTime( \
&timerdelta##label, \
timerstart##label, \
timerstop##label); \
std::cout << \
"TIMING: " << \
timerdelta##label << " ms (" << \
#label << \
")" << std::endl;
#endif
The code runs without error for me. I'm not suggesting this is the correct fix. The correct fix may be to properly set the device before calling the macro. It seems evident that either the macro writer did not expect this kind of usage, or else was unaware of the hazard.
The only situation I could imagine where the error would not occur would be in a single-device system. When the code maintainer responded to your issue that they could not reproduce the issue, my guess is they have not tested the code on a multi-device system. As near as I can tell, the error would be unavoidable in a multi-device setup.

OS dev and where the boot sector is in qemu

I found this article about writing a simple OS:
https://www.cs.bham.ac.uk/~exr/lectures/opsys/10_11/lectures/os-dev.pdf
In chapter 3 part 4 there it says that an "X" will be written to the boot sector at: 0x7c00 but, in qemu if you dump the memory you can see its not written to 0x7c00, its written to, 0x7fc8.
Why is this?
What bytes does qemu use for the boot sector?
https://gyazo.com/6dcfaeffea19dfbc6edffa22a1bf0c83
This is the code in the os-dev pdf:
mov ah , 0 x0e
mov bx , the_secret
add bx , 0 x7c00
mov al , [bx]
int 0 x10`
jmp $ ; Jump forever.
the_secret :
db "X "
; Padding and magic BIOS number.
times 510 -( $ - $$ ) db 0
dw 0 xaa55

what is the result of (54.125) - (184)10

I am practicing for midterm and apprently there's no answer key for it.
However, I practiced and got a result but not sure if this is correct since the solution is really long.
perfrom the following aritmetic operations using the 10 bit floating point standard(given on equation sheet)
-I need to convert to normalized
-standardize properly
-convert final answer to normalized 10 bit.
-convert the final answer to decimal.
-must indicate if there is any loss of precision
my solution is really long so I'm just gonna post only answer here.
-normalized
: 54.125 = 0 1100 10110
: 184 = 0 1110 01110
-standardized:
54.125 = 0 1110 0.01101
184 = 0 1110 1.01110
-result :
0 0010 0000
-denormalized: 0.03125
-precision lost: -129.90625
please help thanks!

Removing runtime errors with the -G flag [duplicate]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I'm trying to figure out why cudaMemcpyToSymbol is not working for me. (But cudaMemcpy does.)
// symbols:
__constant__ float flt[480]; // 1920 bytes
__constant__ int ints[160]; // 640 bytes
// func code follows:
float* pFlts;
cudaMalloc((void**)&pFlts, 1920+640); // chunk of gpu mem (floats & ints)
// This does NOT work properly:
cudaMemcpyToSymbol(flt,pFlts,1920,0,cudaMemcpyDeviceToDevice); // first copy
cudaMemcpyToSymbol(ints,pFlts,640,1920,cudaMemcpyDeviceToDevice); // second copy
The second copy is trashing the contents of the first copy (flt), and the second copy does not happen. (If I remove the second copy, the first copy works fine.)
Results:
GpuDumpFloatMemory<<<1,1>>>(0x500500000, 13, 320) TotThrds=1 ** Source of 1st copy
0x500500500: float[320]= 1.000
0x500500504: float[321]= 0.866
0x500500508: float[322]= 0.500
0x50050050c: float[323]= -0.000
0x500500510: float[324]= -0.500
0x500500514: float[325]= -0.866
0x500500518: float[326]= -1.000
0x50050051c: float[327]= -0.866
0x500500520: float[328]= -0.500
0x500500524: float[329]= 0.000
0x500500528: float[330]= 0.500
0x50050052c: float[331]= 0.866
0x500500530: float[332]= 1.000
GpuDumpFloatMemory<<<1,1>>>(0x500100a98, 13, 320) TotThrds=1 ** Dest of 1st copy
0x500100f98: float[320]= 0.000
0x500100f9c: float[321]= 0.500
0x500100fa0: float[322]= 0.866
0x500100fa4: float[323]= 1.000
0x500100fa8: float[324]= 0.866
0x500100fac: float[325]= 0.500
0x500100fb0: float[326]= -0.000
0x500100fb4: float[327]= -0.500
0x500100fb8: float[328]= -0.866
0x500100fbc: float[329]= -1.000
0x500100fc0: float[330]= -0.866
0x500100fc4: float[331]= -0.500
0x500100fc8: float[332]= 0.000
GpuDumpIntMemory<<<1,1>>>(0x500500780, 13, 0) TotThrds=1 ** Source of 2nd copy
0x500500780: int[0]= 1
0x500500784: int[1]= 1
0x500500788: int[2]= 1
0x50050078c: int[3]= 1
0x500500790: int[4]= 1
0x500500794: int[5]= 1
0x500500798: int[6]= 1
0x50050079c: int[7]= 1
0x5005007a0: int[8]= 1
0x5005007a4: int[9]= 1
0x5005007a8: int[10]= 1
0x5005007ac: int[11]= 1
0x5005007b0: int[12]= 0
GpuDumpIntMemory<<<1,1>>>(0x500100818, 13, 0) TotThrds=1 ** Dest of 2nd copy
0x500100818: int[0]= 0
0x50010081c: int[1]= 0
0x500100820: int[2]= 0
0x500100824: int[3]= 0
0x500100828: int[4]= 0
0x50010082c: int[5]= 0
0x500100830: int[6]= 0
0x500100834: int[7]= 0
0x500100838: int[8]= 0
0x50010083c: int[9]= 0
0x500100840: int[10]= 0
0x500100844: int[11]= 0
0x500100848: int[12]= 0
The following works properly:
cudaMemcpyToSymbol(flt,pFlts,1920,0,cudaMemcpyDeviceToDevice); // first copy
int* pTemp;
cudaGetSymbolAddress((void**) &pTemp, ints);
cudaMemcpy(ints,pFlts+480,640,cudaMemcpyDeviceToDevice); // second copy
Results:
GpuDumpFloatMemory<<<1,1>>>(0x500500000, 13, 320) TotThrds=1 ** Source of first copy
0x500500500: float[320]= 1.000
0x500500504: float[321]= 0.866
0x500500508: float[322]= 0.500
0x50050050c: float[323]= -0.000
0x500500510: float[324]= -0.500
0x500500514: float[325]= -0.866
0x500500518: float[326]= -1.000
0x50050051c: float[327]= -0.866
0x500500520: float[328]= -0.500
0x500500524: float[329]= 0.000
0x500500528: float[330]= 0.500
0x50050052c: float[331]= 0.866
0x500500530: float[332]= 1.000
GpuDumpFloatMemory<<<1,1>>>(0x500100a98, 13, 320) TotThrds=1 ** Dest of first copy
0x500100f98: float[320]= 1.000
0x500100f9c: float[321]= 0.866
0x500100fa0: float[322]= 0.500
0x500100fa4: float[323]= -0.000
0x500100fa8: float[324]= -0.500
0x500100fac: float[325]= -0.866
0x500100fb0: float[326]= -1.000
0x500100fb4: float[327]= -0.866
0x500100fb8: float[328]= -0.500
0x500100fbc: float[329]= 0.000
0x500100fc0: float[330]= 0.500
0x500100fc4: float[331]= 0.866
0x500100fc8: float[332]= 1.000
GpuDumpIntMemory<<<1,1>>>(0x500500780, 13, 0) TotThrds=1 ** Source of 2nd copy
0x500500780: int[0]= 1
0x500500784: int[1]= 1
0x500500788: int[2]= 1
0x50050078c: int[3]= 1
0x500500790: int[4]= 1
0x500500794: int[5]= 1
0x500500798: int[6]= 1
0x50050079c: int[7]= 1
0x5005007a0: int[8]= 1
0x5005007a4: int[9]= 1
0x5005007a8: int[10]= 1
0x5005007ac: int[11]= 1
0x5005007b0: int[12]= 0
GpuDumpIntMemory<<<1,1>>>(0x500100818, 13, 0) TotThrds=1 ** Destination of 2nd copy
0x500100818: int[0]= 1
0x50010081c: int[1]= 1
0x500100820: int[2]= 1
0x500100824: int[3]= 1
0x500100828: int[4]= 1
0x50010082c: int[5]= 1
0x500100830: int[6]= 1
0x500100834: int[7]= 1
0x500100838: int[8]= 1
0x50010083c: int[9]= 1
0x500100840: int[10]= 1
0x500100844: int[11]= 1
0x500100848: int[12]= 0
When I look at the bad case, it appears as though something has happened to the symbol table. As in, the data of the first copy destination is very familiar. Not like it has been overwritten, just moved. Like the pointer is wrong.
The second copy looks broken to me. You have defined this array:
__constant__ int ints[160]; // 640 bytes
which as correctly noted is 640 bytes long.
Your second copy is like this:
cudaMemcpyToSymbol(ints,pFlts,640,1920,cudaMemcpyDeviceToDevice); // second copy
Which says, "copy a total of 640 bytes, from pFlts array to ints array, with the storage location in the ints array beginning at 1920 bytes from the start of the array."
This won't work. The ints array is only 640 bytes long. You can't pick as your destination a location that is 1920 bytes into it.
From the documentation for cudaMemcpyToSymbol :
offset- Offset from start of symbol in bytes
In this case the symbol is ints
Probably what you want is:
cudaMemcpyToSymbol(ints,pFlts+480,640,0,cudaMemcpyDeviceToDevice); // second copy
EDIT:
In response to the questions in the comments about error checking, I crafted this simple test program:
#include <stdio.h>
#define cudaCheckErrors(msg) \
do { \
cudaError_t __err = cudaGetLastError(); \
if (__err != cudaSuccess) { \
fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
msg, cudaGetErrorString(__err), \
__FILE__, __LINE__); \
fprintf(stderr, "*** FAILED - ABORTING\n"); \
exit(1); \
} \
} while (0)
__constant__ int ints[160];
int main(){
int *d_ints;
cudaError_t mystatus;
cudaMalloc((void **)&d_ints, sizeof(int)*160);
cudaCheckErrors("cudamalloc fail");
mystatus = cudaMemcpyToSymbol(ints, d_ints, 160*sizeof(int), 1920, cudaMemcpyDeviceToDevice);
if (mystatus != cudaSuccess) printf("returned value was not cudaSuccess\n");
cudaCheckErrors("cudamemcpytosymbol fail");
printf("OK!\n");
return 0;
}
When I compile and run this, I get the following output:
returned value was not cudaSuccess
Fatal error: cudamemcpytosymbol fail (invalid argument at t94.cu:26)
*** FAILED - ABORTING
This indicates that both the error return value from the cudaMemcpyToSymbol function call and the cudaGetLastError() method return an error in this case. If I change the 1920 parameter to zero in this test case, the error goes away.

How do I get binary byte length in Erlang?

If I have the following binary:
<<32,16,10,9,108,111,99,97,108,104,111,115,116,16,170,31>>
How can I know what length it has?
For byte size:
1> byte_size(<<32,16,10,9,108,111,99,97,108,104,111,115,116,16,170,31>>).
16
For bit size:
2> bit_size(<<32,16,10,9,108,111,99,97,108,104,111,115,116,16,170,31>>).
128
When you have a bit string (a binary with bit length not divisible by the byte size 8) byte_size/1 will round up to the nearest whole byte. I.e. the amount of bytes the bit string would fit in:
3> bit_size(<<0:19>>).
19
4> byte_size(<<0:19>>). % 19 bits fits inside 3 bytes
3
5> bit_size(<<0:24>>).
24
6> byte_size(<<0:24>>). % 24 bits is exactly 3 bytes
3
7> byte_size(<<0:25>>). % 25 bits only fits inside 4 bytes
4
Here's an example illustrating the difference in sizes going from 8 bits (fits in 1 byte) to 17 bits (needs 3 bytes to fit):
8> [{bit_size(<<0:N>>), byte_size(<<0:N>>)} || N <- lists:seq(8,17)].
[{8,1},
{9,2},
{10,2},
{11,2},
{12,2},
{13,2},
{14,2},
{15,2},
{16,2},
{17,3}]