qemu-system-arm and lm3s6965evb / Cortex M3 ... need more ram - qemu

I'm successfully compiling my unit-test with arm-eabi-none-gcc and running them in qemu-system-arm with the machine lm3s6965evb.... But for some of the unit-tests I need more than the 64k of RAM that the lm3s6965evb mcu/machine has.
The IAR simulator apparently has no hard limit in the 'machine', so I just made a phony linkerfile that allows the unittest-program to use e.g. 512k RAM. This works (surprisingly) fine , but qemu doesn't play like that (hangs the moment I change the RAM section in the linkerfile). So I need another machine...
But thinking about it: I think I just need something that executes ARMv7 thumb(2?) code, like the CortexM3. It could also be Cortex-M33 which is a ARMv8 ...
I don't care about Hardware-registers or interrupts etc. I do need, however, printf() to work via semihosting or other means (uart etc), to printout unittest status (success/failures)
What are my best candidates,
modify the lm3s6965evb somehow?
taking an A7?
taking some of the ARM vhdl/fpga machines? (msp2.. musca ...) ?
(The 'virt' machine does not support cortex-m3/m4, according to error message)
?
Thanks
/T

(It turns out, that I misread the "mps2-an385" documentation & tutorials, - it wasn't complicated at all.)
It works if I just use the "mps2-an385" machine and modify the linkerfile to use more flash and ram. Currently i beefed it up to 4x ram and flash which is enough currently. (Haven't found out what the exact limits are.)
Still, I would like to hear if there are other solutions.

QEMU's lm3s6965evb model follows the real hardware, which does not have much RAM. If you want more RAM and you don't specifically want to have a model of those Stellaris boards, pick a board model type which has more RAM. If you need to use an M-profile core, try one of the MPS2 boards. If you are happy with an A-profile core, then the "virt" board with a Cortex-A15 may be a good choice.

Related

IMX6ULL 4G download stalling issue

We have been using IMX6ULL processors along with a Quectel 4G module in our custom made boards. The 4G module can be initialised, brought up and the PPP0 interface can also be initialised which in-turn does provide us with internet connectivity too but, when we start downloading files (of about 10 MB - 200 MB), we have observed that the download begins to stall at irregular intervals. While the download does stall, the PPP0 interface is still up but we lose internet connectivity hence, we have to kill PPPD and re-initialise PPP0.
We have tried using different variations of PPP0 initialisation scripts that we could get our hands on but the issue still persisted however, recently when we wanted to dump the traffic on the PPP0 interface using TCPDUMP in-order to analyse the same, we observed that the download does not stall anymore and we also observed a much better 4G throughput too. We have still not been able to figure out why this is indeed the case. Any inputs or guidance on the same would be of great help.
P.S: The kernel version we have been using is 4.1.15 but, we have observed a similar behavior with the 5.4.70 kernel too.
Thanks in advance
Regards
Nitin
Check the 4G network first with AT+COPS? and AT+CSQ
Does the Module disconnect from the base station?
Do not try kill pppd and restart setting up ppp0, and try AT+CFUN=0 \ AT+CFUN=1 to restart the network registration first.
And for 4G module, the Quectel provide a tool named quectel-CM to setup internet connection, it is of better performance than ppp.
btw, have you check the the memory used and the CPU status?

Can QEMU catch accesses to non-existing RAM on a Cortex-M processor?

is it possible to have QEMU report read or write accesses to non-existing RAM for a Cortex-M machine? Maybe a log function or a special option when building QEMU?
After all, it gets passed the address map - so it shouldn't be too difficult to detect this situation (of course this incurs a run-time cost).
All I found in the sources was
"Trying to execute code outside RAM or ROM at 0x"
but nothing comparable for reading or writing.
Any help will be highly appreciated.
M'
If you're looking at a QEMU with the "Trying to execute code outside RAM or ROM" message then it's a pretty old one and you should probably be looking at a newer version, especially if you're digging into its source code.
Anyway, it is in theory possible but it isn't currently an implemented feature. You could probably get 90% of the way there by enabling the DEBUG_UNASSIGNED printf()s in softmmu/memory.c. A proper solution would require implementing suitable tracepoints instead.

Run my application in a simulated low memory, slow CPU environment

I want to stress-test my application this way, because it seems to be failing in some very old client machines.
At first I read a bit about QEmu and thought about hardware emulation, but it seems a long shot. I asked at superuser, but didn't get much feedback (yet).
So I'm turning to you guys... How do you this kind of testing?
I'm not sure about slowing a CPU but if you use a virtual machine, like VMWare, you can control how much RAM is actually used. I run it on a MBP at home with 8GB and my WinXP VM is capped at 1.5 GB RAM.
EDIT: I just checked my version of VMWare and I can control the number of cores it can use. It's definitely not the same as a slower CPU but it might highlight some issues for you.
Since it's not entirely clear that your app is failing because of the old hardware or the old OS, a VM client should allow you to test various versions of OSes rather quickly. It came in handy for me a few years back when I was trying to get a .Net 2.0 app to run on Win98 (it can be done though I don't remember how I got it working...).
Virtual Box is a free virtual machine similar to VMWare. It also has the capacity to reduce available memory. It can restrict how many CPUs are available, but not how fast those CPUs are.
Try cpulimit, most distro includes it (Ubuntu does) http://www.digipedia.pl/man/doc/view/cpulimit.1
If you want to lower the speed of your cpu you can easily do this by modifying a fork bomb program
int main(){
int x=0;
int limit = 10
while( x < limit ){
int pid = fork();
if( pid == 0 )
while( 1 ){}
else
x++;
}
}
This will slow down your computer quite quickly, you may want to change the limit variable to a higher number. I must warn you though this can be dangerous, because if implemented wrong you could fork bomb your system leaving it useless unless you restart it. Read this first if you don't understand what this code will do.
On POSIX (Unix) systems you can apply run limits to processes (that is, to executions of a program). The system call to do this is called setrlimit(), and most shells enable you to use the ulimit built-in to set them from the command-line (plain POSIX ulimit is not very useful). Using these you can run a program with low limits to simulate a smaller computer.
POSIX systems also provide a nice command for running a program at lower CPU priority, which can simulate a slower CPU if you also ensure there is another CPU intensive progam running at the same time.
I think it's pretty unlikely that cpu speed is going to exercise very many bugs; On the other hand, it's much more likely for different cpu features to matter. Many VM implementations provide ways of toggling on and off certain cpu features; qemu in particular permits a high level of control over what's available to the CPU.
Think outside the box. Which application of the ones you use regularly does this?
A debugger of course! But, how can you achieve such a behavior, to emulate a low cpu?
The secret to your question is asm _int 3. This is the assembly "pause me" command that is send from the attached debugger to the application you are debugging.
More about int 3 to this question.
You can use the code from this tool to pause/resume your process continuously. You can add an interval and make that tool pause your application for that amount of time.
The emulated-cpu-speed would be: (YourCPU/Interval) -0.00001% because of the signaling and other processes running on your machine, but it should do the trick.
About the low memory emulation:
You can create a wrapper class that allocates memory for the application and replace each allocation with call to this class. You would be able to set exactly the amount of memory your application can use before it fails to allocate more memory.
Something such as: MyClass* foo = AllocWrapper(new MyClass(arguments or whatever));
Then you can have the AllocWrapper allocating/deallocating memory for you.
On Linux, you can use ulimit as Raedwald said. On Windows, you can use the SetProcessWorkingSetSize system call. But these only set a limit on a per process basis. In reality, parts of the system will start to fail in a stressed environment. I would suggest using the Sysinternals' testlimit tool to stress the entire machine.
See https://serverfault.com/questions/36309/throttle-down-cpu-speed-of-vmware-image
were it is claimed free-as-in-beer VMware vSphere Hypervisorâ„¢ (ESXi) allows you to select the virtual CPU speed on top of setting the memory size of the virtual machine.

VGA Video using an ARM7

I need to put out a VGA signal from an AT91SAM7SE512. How can I do this without using an extra controller? I saw stuff on the web, but it needs to be able to modify the specific pixels.
You could probably use something similar to old tricks to make NTSC signals with PWM it will probably look horrible. A better bet is to get some form of video controller even a cheap low resolution one.
You could also try some form of FPGA to VGA like this
Unless your ARM7 has some kind of controller, capable of reading memory and outputting video signal without CPU intervention, ie some kind of framebuffer, I don't think you can do that with an ARM7. Well, you probably can, but not within a general purpose OS like linux.
What you can do is transform your ARM7 into a VGA dedicated controlleur, that spends its time launching dma transfer from SDRAM to an external bus. This will IMO not leave a lot of resource to do anything else.
Your ARM chip has an ADC. It doesn't have a DAC, though. VGA is an multiple-channel analog output, so you need some kind of DAC, and in turn an external component. Another problem you might encounter is the need for proper drivers (the electronic kind, not sofware). A VGA cable can be quite long, which means you have large capacities to overcome, plus it may work as an antenna.

registers vs stacks

What exactly are the advantages and disadvantages to using a register-based virtual machine versus using a stack-based virtual machine?
To me, it would seem as though a register based machine would be more straight-forward to program and more efficient. So why is it that the JVM, the CLR, and the Python VM are all stack-based?
Implemented in hardware, a register-based machine is going to be more efficient simply because there are fewer accesses to the slower RAM. In software, however, even a register based architecture will most likely have the "registers" in RAM. A stack based machine is going to be just as efficient in that case.
In addition a stack-based VM is going to make it a lot easier to write compilers. You don't have to deal with register allocation strategies. You have, essentially, an unlimited number of registers to work with.
Update: I wrote this answer assuming an interpreted VM. It may not hold true for a JIT compiled VM. I ran across this paper which seems to indicate that a JIT compiled VM may be more efficient using a register architecture.
This has already been answered, to a certain level, in the Parrot VM's FAQ and associated documents:
A Parrot Overview
The relevant text from that doc is this:
the Parrot VM will have a register architecture, rather than a stack architecture. It will also have extremely low-level operations, more similar to Java's than the medium-level ops of Perl and Python and the like.
The reasoning for this decision is primarily that by resembling the underlying hardware to some extent, it's possible to compile down Parrot bytecode to efficient native machine language.
Moreover, many programs in high-level languages consist of nested function and method calls, sometimes with lexical variables to hold intermediate results. Under non-JIT settings, a stack-based VM will be popping and then pushing the same operands many times, while a register-based VM will simply allocate the right amount of registers and operate on them, which can significantly reduce the amount of operations and CPU time.
You may also want to read this: Registers vs stacks for interpreter design
Quoting it a bit:
There is no real doubt, it's easier to generate code for a stack machine. Most freshman compiler students can do that. Generating code for a register machine is a bit tougher, unless you're treating it as a stack machine with an accumulator. (Which is doable, albeit somewhat less than ideal from a performance standpoint) Simplicity of targeting isn't that big a deal, at least not for me, in part because so few people are actually going to directly target it--I mean, come on, how many people do you know who actually try to write a compiler for something anyone would ever care about? The numbers are small. The other issue there is that many of the folks with compiler knowledge already are comfortable targeting register machines, as that's what all hardware CPUs in common use are.
Traditionally, virtual machine implementors have favored stack-based architectures over register-based due to 'simplicity of VM implementation' ease of writing a compiler back-end - most VMs are originally designed to host a single language and code density and executables for stack architecture are invariably smaller than executables for register architectures. The simplicity and code density are a cost of performance.
Studies have shown that a registered-based architecture requires an average of 47% less executed VM instructions than stack-based architecture, and the register code is 25% larger than corresponding stack code but this increase cost of fetching more VM instructions due to larger code size involves only 1.07% extra real machine loads per VM instruction which is negligible. The overall performance of the register-based VM is that it takes, on average, 32.3% less time to execute standard benchmarks.
One reason for building stack-based VMs is that that actual VM opcodes can be smaller and simpler (no need to encode/decode operands). This makes the generated code smaller, and also makes the VM code simpler.
How many registers do you need?
I'll probably need at least one more than that.
Stack based VM's are simpler and the code is much more compact. As a real world example, a friend built (about 30 years ago) a data logging system with a homebrew Forth VM on a Cosmac. The Forth VM was 30 bytes of code on a machine with 2k of ROM and 256 bytes of RAM.
It is not obvious to me that a "register-based" virtual machine would be "more straight-forward to program" or "more efficient". Perhaps you are thinking that the virtual registers would provide a short-cut during the JIT compilation phase? This would certainly not be the case, since the real processor may have more or fewer registers than the VM, and those registers may be used in different ways. (Example: values that are going to be decremented are best placed in the ECX register on x86 processors.) If the real machine has more registers than the VM, then you're wasting resources, fewer and you've gained nothing using "register-based" programming.
Stack based VMs are easier to generate code for.
Register based VMs are easier to create fast implementations for, and easier to generate highly optimized code for.
For your first attempt, I recommend starting with a stack based VM.