Halting really undecidable for computers with limited memory? - language-agnostic

Turing proved that the halting problem is undecidable over Turing machines. However, real computers are not actually Turing-complete: They would be, if they had an infinite amount of memory.
Given the fact that computers have a finite amount of memory, hence are not quite Turning-complete, does the halting problem become decidable? My intuition tells me that yes, but the program that solves this restricted halting problem might have a time and space complexity exponential to the size of the memory of the targeted computer.

The halting problem can be solved for a Turing machine with finite tape by a Turing machine with infinite tape. All the infinite Turing machine has to do is enumerate every possible state of the finite Turing machine (and there will be a finite, though very large, number of possible states) and mark which states have been visited by the Turing machine in the course of running a program. Eventually, one of two things will happen:
The finite Turing machine will halt.
The finite Turing machine will revisit a state. If it revisits a state, then you know there is an infinite loop, since the machine is deterministic, and the next state is therefore determined entirely by the previous state. If there are n states, the machine is guaranteed to revisit one of them on the n+1th step.

Related

Is a "single cycle cpu" possible if asynchronous components are used?

I have heard the term "Single Cycle Cpu" and was trying to understand what single cycle cpu actually meant. Is there a clear agreed definition and consensus and what is means?
Some home brew "single cycle cpu's" I've come across seem to use both the rising and the falling edges of the clock to complete a single instruction. Typically, the rising edge acts as fetch/decode and the falling edge as execute.
However, in my reading I came across the reasonable point made here ...
https://zipcpu.com/blog/2017/08/21/rules-for-newbies.html
"Do not transition on any negative (falling) edges.
Falling edge clocks should be considered a violation of the one clock principle,
as they act like separate clocks.".
This rings true to me.
Needing both the rising and falling edges (or high and low phases) is effectively the same as needing the rising edge of two cycles of a single clock that's running twice as fast; and this would be a "two cycle" CPU wouldn't it.
So is it honest to state that a design is a "single cycle CPU" when both the rising and falling edges are actively used for state change?
It would seem that a true single cycle cpu must perform all state changing operations on a single clock edge of a single clock cycle.
I can imagine such a thing is possible providing the data strorage is all synchronous. If we have a synchronous system that has settled then on the next clock edge we can clock the results into a synchronous data store and simultaneously clock the program counter on to the next address.
But if the target data store is for example async RAM then the surely control lines would be changing whilst that data is being stored leading to unintended behaviours.
Am I wrong, are there any examples of a "single cycle cpu" that include async storage in the mix?
It would seem that using async RAM in ones design means one must use at least two logical clock cycles to achive the state change.
Of course, with some more complexity one could perhaps add anhave a cpu that uses a single edge where instructions use solely synchronout components, but relies on an extra cycle when storing to async data; but then that still wouldn't be a single cycle cpu, but rather a a mostly single cycle cpu.
So no CPU that writes to async RAM (or other async component) can honestly be considered a single cycle CPU because the entire instruction cannot be carried out on a single clock edge. The RAM write needs two edges (ie falling and rising) and this breaks the single clock principal.
So is there a commonly accepted single cycle CPU and are we applying the term consistently?
What's the story?
(Also posted in my hackday log https://hackaday.io/project/166922-spam-1-8-bit-cpu/log/181036-single-cycle-cpu-confusion and also on a private group in hackaday)
=====
Update: Looking at simple MIP's it seems the models use synchronous memory and so can probably operate off a single edge ad maybe it does - therefore warrant the category "single cycle".
And perhaps FPGA memory is always synchronous - I don't know about that.
But is the term using inconsistently elsewhere - ie like most Homebrew TTL Computers out there??
Or am I just plain wrong?
====
Update :
Some may have misunderstood my point.
Numerous home brew TTL cpu's claim "single cycle CPU" status (not interested for the purposes of this discussion in more complex beasts that do pipelining or whatever).
By single cycle these CPU's they typically mean that they do something like advancing the PC on one edge of the clock and then the use the opposing edge of the clock to update flipflops with the result. OR they will use the other phase of the clock to update async components like latches and sram.
However, the ZipCPU reference I provided suggests that using the opposing clock edge is akin to using a second clock cycle or even a second clock. BTW Ben Eater in his vids even compares the inverted clock that he uses to update his SRAM to being a second clock.
My objection to the use of "single cycle CPU" with such CPU's (basically most/all home bred TTL CPU's I've seen as they all work that way) is that I agree with ZipCPU that using the opposing edge (or phase) of the clock for the commit is effectively the same as using a second clock and this makes a mockery of the "single cycle" claim.
If the use of oposing edge is effectively the same a using a single edge but of dual clock cycles then I think that makes use of the term questionable. So I take ZipCPU's point to heart and tighten the term to mean use of a single edge.
On the other hand is seems perfectly possible to build a CPU that uses only sync components (ie edge triggered flip flops) and which uses only a single edge, where on each edge, we clock whatever is on the bus into whatever device is selected for write and at the same moment advance the PC.
Between one edge and the next same direction edge, settling occurs.
In this manner we end up with CPI=1 and use of only a single edge - which is very distinctly different to the common TTL CPU pattern of using both edges of the clock.
BTW my impression of FPGA's (which I'm not referring to here) is that the storage elements in FPGA are all synchronous flip flops. I don't know, but that's what my reading suggests. Anyway, if this is true then a trivial FPGA based CPU probabnly has a CPI=1 and uses only say the +ve edge and so these might well meet my narrow definition of "single cycle cpu". Also, my reading suggests that various MIP's impls (educational efforts probably) are probably meeting my definition ok.
This seems mostly a question of definitions and terminology, moreso than how to actually build simple CPUs.
If you insist on that strict definition of "single cycle CPU" meaning to truly use only clock edge to set everything in motion for that instruction, then yes, that would exclude real-world toy/hobby CPUs that use a 2nd clock edge to give a consistent interval for memory access.
But they certainly still fulfil the spirit of a single-cycle CPU, which is that every instruction runs in 1 clock cycle, with no pipelining and no multi-cycle microcode.
A whole clock cycle does have 2 clock edges, and it's normal for real-world (non-single-cycle) CPUs to use the "other" edge for internal timing in some cases, but we still talk about their frequency in whole cycles, not the edge frequency, except in cases like DDR memory where we do talk about the transfer rate = twice the memory clock frequency. What sets that apart is always using both edges, and for approximately equal things, not just some extra timing / synchronization within a clock cycle.
Now could you build a CPU that keeps a store value on a memory bus for some minimum time, without using a clock edge? Maybe.
Perhaps make sure the critical path leading to store-data it is short enough that the data is always ready. And possibly propagate some "data-ready" signal along with your computations (or just from the longest critical path of any instruction), and after a couple gate delays after the data is on the bus, flip the memory clock. (And on the next CPU clock edge, flip it back). If your memory doesn't mind its clock not having a uniform duty cycle, this might be fine as long as each half of the memory clock is long enough.
For loading from memory, you can maybe do something similar by initiating a memory load cycle some gate-delays after the CPU clock edge that starts this "cycle" of your single-cycle CPU. This might even involve building a long chain of gate delays intentionally with inverters dedicated to that purpose. Or perhaps even an analog RC time delay, but either way that's obviously worse than just using the other edge of the main clock, and you'd only ever do this as an exercise in single-cycle dogmatic purity. (It can also be flaky because gate-delay isn't constant, depending on voltage and temperature, so one side of the chip running hotter than the other could change relative timing.)
The definition says that a single cycle CPU takes just one instruction per one cycle. So it's possible to make a conclusion in theory that there are other CPU's that takes more or less instruction per cycle. You can check it out that there are some concepts like multi-cycle processor and pipelined processor (Instruction pipelining). "Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps." acorkcording to Wiki. I don't know how exactly it works, but maybe it just uses available registers (maybe instead of using for example EAX, ECX is used like EAX or maybe it works some other way, but one for sure is 100% true that number of registers in increasing, so maybe it's one of main purposes. Source: https://en.wikipedia.org/wiki/Instruction_pipelining
I think the answer for the question: "is-a-single-cycle-cpu-possible-if-asynchronous-components-are-used" depends on CPU controller that controls both CPU and RAM with opcodes. I found interesing information about this on site: http://people.cs.pitt.edu/~cho/cs1541/current/handouts/lect-pipe_4up.pdf
https://ibb.co/tKy6sR2
CONCLUSION: In my opinion, if we consider the term "single cycle CPU" it should be the simplest possible construction. The term "asynchronous" implements a conclusion, that is more complex than "synchronous". So both terms are not equivalent. It's something like "Can a basic data type be considered as a structure?". In my opinion the word "single" means the simplest possible and "asynchronous" means some modification, so more complex, so just think it's not possible, but maybe the term "are used" can be bypassed by "are used at the time" - if some switch, some controller can turn off asynchronous mode and make this all the simplest possible. But generally just think it's not possible

MIPS Architecture : NOP (No-Operation) Vs Data Forwarding in Hazard Prevention

I learnt in computer architecture course that, data hazard can be prevented by using several arbitrary, independent nop instructions in between two mutually dependent instructions. This can be done at assembly level in compiler design.
The alternative way to avoid data hazard is to use data forwarding.
I am bit confused, How these two alternatives differ as far as performance, speed and hardware is concerned. Because as per my knowledge data forwarding is to be implemented at hardware level, whereas nop can be implemented at assembly level.
Anybody please explain me which approach is better if we consider factors such as performance, speed, hardware etc?
Thanks.
Obviously, having the compiler insert nops into the code stream to fill pipeline slots allows hardware to be simplified which can reduce the duration of a pipeline stage or the depth of the pipeline, reduce design effort (time to market, project risk, design cost), or allow a full processor core to fit on a single chip (which helps performance). However, this benefit is tiny compared to the loss of performance from not using forwarding. Higher latency for dependent instructions is very bad for typical programs.
The MIPS R2000, which had both delayed branches and delayed loads, provided result forwarding. (MIPS is an acronym for "Microprocessor without Interlocked Pipeline Stages"). Delayed loads were soon removed from MIPS (which was possible because such did not affect binary compatibility of correct code). The use of delayed instructions was partially from a belief that most delay slots could be filled by the compiler with useful instructions and partially from believing that the increase in code size was not important relative to the simplification of hardware.
Reducing the latency of a load operation was not practical, so the pipeline would need to be stalled for a cycle anyway. The cost of a nop is in cache and memory capacity effects (i.e., the effect of lower code density), and in some cases a single load delay slot could be filled.
Exposing the pipeline organization also has implications for binary compatibility. Later binary compatible implementations must accommodate the ISA designed for the original pipeline organization. A single delayed branch slot works reasonably well for a simple 5-stage scalar implementation (it can be filled with a useful instruction most of the time and allows zero-effective-delay branches [i.e., no stall to resolve the branch or prediction and flushing the pipeline on misprediction]), but when the pipeline is deepened (or made wider) prediction or stalling becomes necessary anyway.
If sufficient parallelism exists in the targeted workloads, hardware simplicity is sufficiently important, and binary compatibility is not a problem, then exposing a pipeline with minimal support for dynamically detecting and handling stall conditions may be sensible. (There are also ways of encoding nops that avoid most of the code size expansion issues.) Having reliably sufficient parallelism (whether instruction-level or thread-level) allows the avoiding of nops; by compiler scheduling with instruction-level parallelism or by hardware thread interleaving with thread-level parallelism.
Hardware simplicity tends to reduce energy per unit of work (as well as chip area), and many modern designs are limited by power use. It also makes sense to perform optimizations at compile time (when they are less latency critical and can be done once rather than each time the code is executed) if the storage and communication cost of additional information is not too expensive (assuming information necessary to perform the optimization is available at compile time [dynamic branch prediction is a classic example of where dynamic information is helpful]).
Well, basically since hardware is optimised with feed forwarding, there has to be no use of explicitly declared software NOPs. But that's not the case.
Though, feed forwarding proves helpful in reducing data hazards, but some hazards cannot be dealt with feed forwarding. It just isn't possible.
Eg.
beq R1,R5,label
instruction 2nd
Here the instruction 2nd will not be fetched until instruction 1 has completed its execution stage and decided whether or not to branch. Until then the 2nd instruction has to be stalled. (stalled for 2 memory cycles). This is done by software by sending out NOPs.
With improvements in technology and hardware optimizations, the beq instruction can complete its execution stage in its register fetch/decode stage by inserting a comparator in the fetch stage itself. Even so, the 2nd instruction will be stalled for(1 memory cycle now). Again NOP is needed.

Terminology for a "complete" programming language?

The full definition of "Turing Completeness" requires infinite memory.
Is there a better term than Turing Complete for a programming language and implementation that seems useably complete, except for being limited by finite (say 100 word or 16-bit or 32-bit, etc.) address space?
I guess, you can bring the limited memory into the definition. Something like
A programming language for a given architecture (!) is Limitedly Turing Complete, if for every Turing Machine there exists a program that either
a) simulates the Turing Machine and returns the same result (iff the Turing Machine returns) or
b) at some point uses at least one available limited resource (e.g. memory) completely and returns an arbitrary result.
The question is, whether this intuitive definition really helps, or if it is better to assume that your architecture has unlimited memory (even though it is actually finite). Note that you don't even have to try hard in order to satisfy Limited Turing Completeness (as defined above), if you simply go into an infinite loop that mallocs one byte each time, you found your program for all Turing Machines.
The problem seems to be that you cannot pin implementation specific properties. For instance, if you have 500K ram, you may be able to express a program that computes 1+1 but maybe you're not, who knows.
I'd argue that languages like Haskell and Brainfuck (yes, I'm serious) are actually Turing Complete because they abstract resources away. While languages like C++ are only Limitedly Turing Complete, because at some point the address-space of pointers is exhausted and it is not possible to address any more data (e.g. sort a list of 2^2^2^2^100 items).
You could say that an implementation requires infinite memory to be truly turing complete, but languages themselves have no concept of a memory limit. You can make a compiler for a million-bit machine or a 4-bit machine without changing the language.

registers vs stacks

What exactly are the advantages and disadvantages to using a register-based virtual machine versus using a stack-based virtual machine?
To me, it would seem as though a register based machine would be more straight-forward to program and more efficient. So why is it that the JVM, the CLR, and the Python VM are all stack-based?
Implemented in hardware, a register-based machine is going to be more efficient simply because there are fewer accesses to the slower RAM. In software, however, even a register based architecture will most likely have the "registers" in RAM. A stack based machine is going to be just as efficient in that case.
In addition a stack-based VM is going to make it a lot easier to write compilers. You don't have to deal with register allocation strategies. You have, essentially, an unlimited number of registers to work with.
Update: I wrote this answer assuming an interpreted VM. It may not hold true for a JIT compiled VM. I ran across this paper which seems to indicate that a JIT compiled VM may be more efficient using a register architecture.
This has already been answered, to a certain level, in the Parrot VM's FAQ and associated documents:
A Parrot Overview
The relevant text from that doc is this:
the Parrot VM will have a register architecture, rather than a stack architecture. It will also have extremely low-level operations, more similar to Java's than the medium-level ops of Perl and Python and the like.
The reasoning for this decision is primarily that by resembling the underlying hardware to some extent, it's possible to compile down Parrot bytecode to efficient native machine language.
Moreover, many programs in high-level languages consist of nested function and method calls, sometimes with lexical variables to hold intermediate results. Under non-JIT settings, a stack-based VM will be popping and then pushing the same operands many times, while a register-based VM will simply allocate the right amount of registers and operate on them, which can significantly reduce the amount of operations and CPU time.
You may also want to read this: Registers vs stacks for interpreter design
Quoting it a bit:
There is no real doubt, it's easier to generate code for a stack machine. Most freshman compiler students can do that. Generating code for a register machine is a bit tougher, unless you're treating it as a stack machine with an accumulator. (Which is doable, albeit somewhat less than ideal from a performance standpoint) Simplicity of targeting isn't that big a deal, at least not for me, in part because so few people are actually going to directly target it--I mean, come on, how many people do you know who actually try to write a compiler for something anyone would ever care about? The numbers are small. The other issue there is that many of the folks with compiler knowledge already are comfortable targeting register machines, as that's what all hardware CPUs in common use are.
Traditionally, virtual machine implementors have favored stack-based architectures over register-based due to 'simplicity of VM implementation' ease of writing a compiler back-end - most VMs are originally designed to host a single language and code density and executables for stack architecture are invariably smaller than executables for register architectures. The simplicity and code density are a cost of performance.
Studies have shown that a registered-based architecture requires an average of 47% less executed VM instructions than stack-based architecture, and the register code is 25% larger than corresponding stack code but this increase cost of fetching more VM instructions due to larger code size involves only 1.07% extra real machine loads per VM instruction which is negligible. The overall performance of the register-based VM is that it takes, on average, 32.3% less time to execute standard benchmarks.
One reason for building stack-based VMs is that that actual VM opcodes can be smaller and simpler (no need to encode/decode operands). This makes the generated code smaller, and also makes the VM code simpler.
How many registers do you need?
I'll probably need at least one more than that.
Stack based VM's are simpler and the code is much more compact. As a real world example, a friend built (about 30 years ago) a data logging system with a homebrew Forth VM on a Cosmac. The Forth VM was 30 bytes of code on a machine with 2k of ROM and 256 bytes of RAM.
It is not obvious to me that a "register-based" virtual machine would be "more straight-forward to program" or "more efficient". Perhaps you are thinking that the virtual registers would provide a short-cut during the JIT compilation phase? This would certainly not be the case, since the real processor may have more or fewer registers than the VM, and those registers may be used in different ways. (Example: values that are going to be decremented are best placed in the ECX register on x86 processors.) If the real machine has more registers than the VM, then you're wasting resources, fewer and you've gained nothing using "register-based" programming.
Stack based VMs are easier to generate code for.
Register based VMs are easier to create fast implementations for, and easier to generate highly optimized code for.
For your first attempt, I recommend starting with a stack based VM.

What is Turing Complete?

What does the expression "Turing Complete" mean?
Can you give a simple explanation, without going into too many theoretical details?
Here's the briefest explanation:
A Turing Complete system means a system in which a program can be written that will find an answer (although with no guarantees regarding runtime or memory).
So, if somebody says "my new thing is Turing Complete" that means in principle (although often not in practice) it could be used to solve any computation problem.
Sometimes it's a joke... a guy wrote a Turing Machine simulator in vi, so it's possible to say that vi is the only computational engine ever needed in the world.
Here is the simplest explanation
Alan Turing created a machine that can take a program, run that program, and show some result. But then he had to create different machines for different programs. So he created "Universal Turing Machine" that can take ANY program and run it.
Programming languages are similar to those machines (although virtual). They take programs and run them. Now, a programing language is called "Turing complete", if it can run any program (irrespective of the language) that a Turing machine can run given enough time and memory.
For example: Let's say there is a program that takes 10 numbers and adds them. A Turing machine can easily run this program. But now imagine that for some reason your programming language can't perform the same addition. This would make it "Turing incomplete" (so to speak). On the other hand, if it can run any program that the universal Turing machine can run, then it's Turing complete.
Most modern programming languages (e.g. Java, JavaScript, Perl, etc.) are all Turing complete because they each implement all the features required to run programs like addition, multiplication, if-else condition, return statements, ways to store/retrieve/erase data and so on.
Update: You can learn more on my blog post: "JavaScript Is Turing Complete" — Explained
Informal Definition
A Turing complete language is one that can perform any computation. The Church-Turing Thesis states that any performable computation can be done by a Turing machine. A Turing machine is a machine with infinite random access memory and a finite 'program' that dictates when it should read, write, and move across that memory, when it should terminate with a certain result, and what it should do next. The input to a Turing machine is put in its memory before it starts.
Things that can make a language NOT Turing complete
A Turing machine can make decisions based on what it sees in memory - The 'language' that only supports +, -, *, and / on integers is not Turing complete because it can't make a choice based on its input, but a Turing machine can.
A Turing machine can run forever - If we took Java, Javascript, or Python and removed the ability to do any sort of loop, GOTO, or function call, it wouldn't be Turing complete because it can't perform an arbitrary computation that never finishes. Coq is a theorem prover that can't express programs that don't terminate, so it's not Turing complete.
A Turing machine can use infinite memory - A language that was exactly like Java but would terminate once it used more than 4 Gigabytes of memory wouldn't be Turing complete, because a Turing machine can use infinite memory. This is why we can't actually build a Turing machine, but Java is still a Turing complete language because the Java language has no restriction preventing it from using infinite memory. This is one reason regular expressions aren't Turing complete.
A Turing machine has random access memory - A language that only lets you work with memory through push and pop operations to a stack wouldn't be Turing complete. If I have a 'language' that reads a string once and can only use memory by pushing and popping from a stack, it can tell me whether every ( in the string has its own ) later on by pushing when it sees ( and popping when it sees ). However, it can't tell me if every ( has its own ) later on and every [ has its own ] later on (note that ([)] meets this criteria but ([]] does not). A Turing machine can use its random access memory to track ()'s and []'s separately, but this language with only a stack cannot.
A Turing machine can simulate any other Turing machine - A Turing machine, when given an appropriate 'program', can take another Turing machine's 'program' and simulate it on arbitrary input. If you had a language that was forbidden from implementing a Python interpreter, it wouldn't be Turing complete.
Examples of Turing complete languages
If your language has infinite random access memory, conditional execution, and some form of repeated execution, it's probably Turing complete. There are more exotic systems that can still achieve everything a Turing machine can, which makes them Turing complete too:
Untyped lambda calculus
Conway's game of life
C++ Templates
Prolog
From wikipedia:
Turing completeness, named after Alan
Turing, is significant in that every
plausible design for a computing
device so far advanced can be emulated
by a universal Turing machine — an
observation that has become known as
the Church-Turing thesis. Thus, a
machine that can act as a universal
Turing machine can, in principle,
perform any calculation that any other
programmable computer is capable of.
However, this has nothing to do with
the effort required to write a program
for the machine, the time it may take
for the machine to perform the
calculation, or any abilities the
machine may possess that are unrelated
to computation.
While truly Turing-complete machines
are very likely physically impossible,
as they require unlimited storage,
Turing completeness is often loosely
attributed to physical machines or
programming languages that would be
universal if they had unlimited
storage. All modern computers are
Turing-complete in this sense.
I don't know how you can be more non-technical than that except by saying "turing complete means 'able to answer computable problem given enough time and space'".
Fundamentally, Turing-completeness is one concise requirement, unbounded recursion.
Not even bounded by memory.
I thought of this independently, but here is some discussion of the assertion. My definition of LSP provides more context.
The other answers here don't directly define the fundamental essence of Turing-completeness.
Turing Complete means that it is at least as powerful as a Turing Machine. This means anything that can be computed by a Turing Machine can be computed by a Turing Complete system.
No one has yet found a system more powerful than a Turing Machine. So, for the time being, saying a system is Turing Complete is the same as saying the system is as powerful as any known computing system (see Church-Turing Thesis).
In the simplest terms, a Turing-complete system can solve any possible computational problem.
One of the key requirements is the scratchpad size be unbounded and that is possible to rewind to access prior writes to the scratchpad.
Thus in practice no system is Turing-complete.
Rather some systems approximate Turing-completeness by modeling unbounded memory and performing any possible computation that can fit within the system's memory.
Super-brief summary from what Professor Brasilford explains in this video.
Turing Complete ≅ do anything that a Turing Machine can do.
It has conditional branching (i.e. "if statement"). Also, implies "go to" and thus permitting loop.
It gets arbitrary amount of memory (e.g. long enough tape) that the program needs.
I think the importance of the concept "Turing Complete" is in the the ability to identify a computing machine (not necessarily a mechanical/electrical "computer") that can have its processes be deconstructed into "simple" instructions, composed of simpler and simpler instructions, that a Universal machine could interpret and then execute.
I highly recommend The Annotated Turing
#Mark i think what you are explaining is a mix between the description of the Universal Turing Machine and Turing Complete.
Something that is Turing Complete, in a practical sense, would be a machine/process/computation able to be written and represented as a program, to be executed by a Universal Machine (a desktop computer). Though it doesn't take consideration for time or storage, as mentioned by others.
A Turing Machine requires that any program
can perform condition testing. That is fundamental.
Consider a player piano roll. The player piano can
play a highly complicated piece of music,
but there is never any conditional logic in the
music. It is not Turing Complete.
Conditional logic is both the power and the
danger of a machine that is Turing Complete.
The piano roll is guaranteed to halt every time.
There is no such guarantee for a TM. This
is called the “halting problem.”
In practical language terms familiar to most programmers, the usual way to detect Turing completeness is if the language allows or allows the simulation of nested unbounded while statements (as opposed to Pascal-style for statements, with fixed upper bounds).
We call a language Turing-complete if and only if (1) it is decidable by a Turing machine but (2) not by anything less capable than a Turing machine. For instance, the language of palindromes over the alphabet {a, b} is decidable by Turing machines, but also by pushdown automata; so, this language is not Turing-complete. Truly Turing-complete languages - ones that require the full computing power of Turing machines - are pretty rare. Perhaps the language of strings x.y.z where x is a number, y is a Turing-machine and z is an initial tape configuration, and y halts on z in fewer than x! steps - perhaps that qualifies (though it would need to be shown!)
A common imprecise usage confuses Turing-completeness with Turing-equivalence. Turing-equivalence refers to the property of a computational system which can simulate, and which can be simulated by, Turing machines. We might say Java is a Turing-equivalent programming language, for instance, because you can write a Turing-machine simulator in Java, and because you could define a Turing machine that simulates execution of Java programs. According to the Church-Turing thesis, Turing machines can perform any effective computation, so Turing-equivalence means a system is as capable as possible (if the Church-Turing thesis is true!)
Turing equivalence is a much more mainstream concern that true Turing completeness; this and the fact that "complete" is shorter than "equivalent" may explain why "Turing-complete" is so often misused to mean Turing-equivalent, but I digress.
As Waylon Flinn said:
Turing Complete means that it is at least as powerful as a Turing Machine.
I believe this is incorrect, a system is Turing complete if it's exactly as powerful as the Turing Machine, i.e. every computation done by the machine can be done by the system, but also every computation done by the system can be done by the Turing machine.
Can a relational database input latitudes and longitudes of places and roads, and compute the shortest path between them - no. This is one problem that shows SQL is not Turing complete.
But C++ can do it, and can do any problem. Thus it is.