Can we measure the computing power of the Cray-II using the PassMark measure?

Can we measure the computing power of the Cray-II using the PassMark measure? - units-of-measurement

I want to measure the computing power of the Cray 2, circa 1985, using a measure we apply to modern computers (including phones).
Recently I've seen the Passmark measure used.
My question is: Can we measure [or extrapolate] the computing power of the Cray-II using the PassMark measure?

Computer benchmarks must always be adapted to the circumstance, I think.
IMHO the best way of measuring the performance of a Cray (blessing and peace be upon him) is to take some input fed to the Cray and see how much time will need another computer to give the same results.
Anyway, for the average smartphone user, who need 30 seconds to read - and understand - an SMS (excluding the time to find the glasses) the speed difference between a Cray, a VIC-20 and his phone is almost irrelevant.

Related

OpenCL for GPU vs. FPGA

I read recently about OpenCL/CUDA for FPGA vs. GPU
As I understood FPGA wins in power criteria.
The explanation for that ,I`ve found in some article:
Reconfigurable devices can have much lower power consumption from peak
values since only configured portions of the chip are active
Based on said above I have a question - does it mean that ,if some CU [Compute Unit] doen`t execute any work-item,it still consumes power? (and if yes - what for it consumes power?)

Yes, idle circuitry still consumes power. It doesn't consume as much, but it still consumes some. The reason for this is down to how transistors work, and how CMOS logic gates consume power.
Classically, CMOS logic (the type on all modern chips) only consumes power when it switches state. This made is very low power when compared to the technologies that came before it which consumed power all the time. Even so, every time a clock edge occurs, some logic changes state even if there's no work to do. The higher the clock rate, the more power used. GPUs tend to have high clock rates so they can do lots of work; FPGAs tend to have low clock rates. That's the first effect, but it can be mitigated by not clocking circuits that have no work to do (called 'clock gating')
As the size of transistors became smaller and smaller, the amount of power used when switching became smaller, but other effects (known as leakage) became more significant. Now we're at a point where the leakage power is very significant, and it's multiplied up by the number of gates you have in a design. Complex designs have high leakage power; Simple designs have low leakage power (in very basic terms). This is a second effect.
Hence, for a simple task it may be more power efficient to have a small dedicated low speed FPGA rather than a large complex, but high speed / general purpose CPU/GPU.

As always, it depends on the workload. For workloads that are well-supported by native GPU hardware (e.g. floating point, texture filtering), I doubt an FPGA can compete. Anecdotally, I've heard about image processing workloads where FPGAs are competitive or better. That makes sense, since GPUs are not optimized to operate on small integers. (For that reason, GPUs often are uncompetitive with CPUs running SSE2-optimized image processing code.)
As for power consumption, for GPUs, suitable workloads generally keep all the execution units busy, so it's a bit of an all-or-nothing proposition.

Based on my research on FPGAs and the way they work, these devices can be designed to be very power efficient and really fine-tuned for one special task (e.g., an algorithm) and use the smallest resources possible (therefore the lower amount of energy consumption among all possible choices except ASIC)
When implementing turning-complete algorithms using FPGAs, the designers have the option of either unrolling their algorithms to use the maximum parallelism offered or use a compact sequential design. Each method has its own cost-benefits; the former helps maximizing performance at the cost of higher resource consumption, and the latter helps minimizing area and resource consumption by reusing hardware at the cost of minimizing the performance.
This level of control over implementation of algorithms doesn’t exist when developing for GPUs. The developers have the control to use the most efficient algorithms yet they are not the one determining the final precise hardware implementation of their algorithms. Unlike FPGA designers who even count “nano-seconds” when calculating their design’s hardware implementation (using post-layout tools), GPU developers rely on available frameworks to enhance all implementation details for them automatically. They develop at much higher levels compared to FPGA designers.
So the well known topic of trade-offs pops up here too; you want exact control over the hardware implementation at the cost of longer development times? Choose FPGAs. You want parallelism, yet have made up your mind to give up exact control over hardware implementation and want to develop using your existing software skills? use OpenCL.

Kudos to #hamzed, but OpenCL is not taking control away from the designer of OpenCL on FPGAs. It actually gives the best of the both worlds: full programmability of FPGA with all custom parallel algorithm benefits as well as much better design closure speed vs. RTL. By being clever about your algorithm moving and not moving data you can get to near theoretical performance of FPGAs. Please see the last chart in this reference: https://www.iwocl.org/wp-content/uploads/iwocl2017-andrew-ling-fpga-sdk.pdf

Estimating increase in speed when changing NVIDIA GPU model

I am currently developing a CUDA application that will most certainly be deployed on a GPU much better than mine. Given another GPU model, how can I estimate how much faster my algorithm will run on it?

You're going to have a difficult time, for a number of reasons:
Clock rate and memory speed only have a weak relationship to code speed, because there is a lot more going on under the hood (e.g., thread context switching) that gets improved/changed for almost all new hardware.
Caches have been added to new hardware (e.g., Fermi) and unless you model cache hit/miss rates, you'll have a tough time predicting how this will affect the speed.
Floating point performance in general is very dependent on model (e.g.: Tesla C2050 has better performance than the "top of the line" GTX-480).
Register usage per device can change for different devices, and this can also affect performance; occupancy will be affected in many cases.
Performance can be improved by targeting specific hardware, so even if your algorithm is perfect for your GPU, it could be better if you optimize it for the new hardware.
Now, that said, you can probably make some predictions if you run your app through one of the profilers (such as the NVIDIA Compute Profiler), and you look at your occupancy and your SM utilization. If your GPU has 2 SMs and the one you will eventually run on has 16 SMs, then you will almost certainly see an improvement, but not specifically because of that.
So, unfortunately, it isn't easy to make the type of predictions you want. If you're writing something open source, you could post the code and ask others to test it with newer hardware, but that isn't always an option.

This can be very hard to predict for certain hardware changes and trivial for others. Highlight the differences between the two cards you're considering.
For example, the change could be as trivial as -- if I had purchased one of those EVGA water-cooled behemoths, how much better would it perform over a standard GTX 580? This is just an exercise in computing the differences in the limiting clock speed (memory or gpu clock). I've also encountered this question when wondering if I should overclock my card.
If you're going to a similar architecture, GTX 580 to Tesla C2070, you can make a similar case of differences in clock speeds, but you have to be careful of the single/double precision issue.
If you're doing something much more drastic, say going from a mobile card -- GTX 240M -- to a top of the line card -- Tesla C2070 -- then you may not get any performance improvement at all.
Note: Chris is very correct in his answer, but I wanted to stress this caution because I envision this common work path:
One says to the boss:
So I've heard about this CUDA thing... I think it could make function X much more efficient.
Boss says you can have 0.05% of work time to test out CUDA -- hey we already have this mobile card, use that.
One year later... So CUDA could get us a three fold speedup. Could I buy a better card to test it out? (A GTX 580 only costs $400 -- less than that intern fiasco...)
You spend the $$, buy the card, and your CUDA code runs slower.
Your boss is now upset. You've wasted time and money.
So what happened? Developing on an old card, think 8800, 9800, or even the mobile GTX 2XX with like 30 cores, leads one to optimize and design your algorithm in a very different way from how you would to efficiently utilize a card with 512 cores. Caveat Emptor You get what you pay for -- those awesome cards are awesome -- but your code may not run faster.
Warning issued, what's the walk away message? When you get that nicer card, be sure to invest time in tuning, testing, and possibly redesigning your algorithm from the ground up.
OK, so that said, rule of thumb? GPUs get twice as fast every six months. So if you're moving from a card that's two years old to a card that's top of the line, claim to your boss that it will run between 4 to 8 times faster (and if you get the full 16-fold improvement, bravo!!)

Does anyone know what "Quantum Computing" is?

In physics, its the ability for particles to exist in multiple/parallel dynamic states at a particular point in time. In computing, would it be the ability of a data bit to equal 1 or 0 at the same time, a third value like NULL[unknown] or multiple values?.. How can this technology be applied to: computer processors, programming, security, etc.?.. Has anyone built a practical quantum computer or developed a quantum programming language where, for example, the program code dynamically changes or is autonomous?

I have done research in quantum computing, and here is what I hope is an informed answer.
It is often said that qubits as you see them in a quantum computer can exist in a "superposition" of 0 and 1. This is true, but in a more subtle way than you might first guess. Even with a classical computer with randomness, a bit can exist in a superposition of 0 and 1, in the sense that it is 0 with some probability and 1 with some probability. Just as when you roll a die and don't look at the outcome, or receive e-mail that you haven't yet read, you can view its state as a superposition of the possibilities. Now, this may sound like just flim-flam, but the fact is that this type of superposition is a kind of parallelism and that algorithms that make use of it can be faster than other algorithms. It is called randomized computation, and instead of superposition you can say that the bit is in a probabilistic state.
The difference between that and a qubit is that a qubit can have a fat set of possible superpositions with more properties. The set of probabilistic states of an ordinary bit is a line segment, because all there is a probability of 0 or 1. The set of states of a qubit is a round 3-dimensional ball. Now, probabilistic bit strings are more complicated and more interesting than just individual probabilistic bits, and the same is true of strings of qubits. If you can make qubits like this, then actually some computational tasks wouldn't be any easier than before, just as randomized algorithms don't help with all problems. But some computational problems, for example factoring numbers, have new quantum algorithms that are much faster than any known classical algorithm. It is not a matter of clock speed or Moore's law, because the first useful qubits could be fairly slow and expensive. It is only sort-of parallel computation, just as an algorithm that makes random choices is only in weak sense making all choices in parallel. But it is "randomized algorithms on steroids"; that's my favorite summary for outsiders.
Now the bad news. In order for a classical bit to be in a superposition, it has be a random choice that is secret from you. Once you look a flipped coin, the coin "collapses" to either heads for sure or tails for sure. The difference between that and a qubit is that in order for a qubit to work as one, its state has to be secret from the rest of the physical universe, not just from you. It has to be secret from wisps of air, from nearby atoms, etc. On the other hand, for qubits to be useful for a quantum computer, there has to be a way to manipulate them while keeping their state a secret. Otherwise its quantum randomness or quantum coherence is wrecked. Making qubits at all isn't easy, but it is done routinely. Making qubits that you can manipulate with quantum gates, without revealing what is in them to the physical environment, is incredibly difficult.
People don't know how to do that except in very limited toy demonstrations. But if they could do it well enough to make quantum computers, then some hard computational problems would be much easier for these computers. Others wouldn't be easier at all, and great deal is unknown about which ones can be accelerated and by how much. It would definitely have various effects on cryptography; it would break the widely used forms of public-key cryptography. But other kinds of public-key cryptography have been proposed that could be okay. Moreover quantum computing is related to the quantum key distribution technique which looks very safe, and secret-key cryptography would almost certainly still be fairly safe.

The other factor where the word "quantum" computing is used regards an "entangled pair". Essentially if you can create an entangled pair of particles which have a physical "spin", quantum physics dictates that the spin on each electron will always be opposite.
If you could create an entangled pair and then separate them, you could use the device to transmit data without interception, by changing the spin on one of the particles. You can then create a signal which is modulated by the particle's information which is theoretically unbreakable, as you cannot know what spin was on the particles at any given time by intercepting the information in between the two signal points.
A whole lot of very interested organisations are researching this technique for secure communications.

Yes, there is quantum encryption, by which if someone tries to spy on your communication, it destroys the datastream such that neither they nor you can read it.
However, the real power of quantum computing lies in that a qubit can have a superposition of 0 and 1. Big deal. However, if you have, say, eight qubits, you can now represent a superposition of all integers from 0 to 255. This lets you do some rather interesting things in polynomial instead of exponential time. Factorization of large numbers (IE, breaking RSA, etc.) is one of them.

There are a number of applications of quantum computing.
One huge one is the ability to solve NP-hard problems in P-time, by using the indeterminacy of qubits to essentially brute-force the problem in parallel.
(The struck-out sentence is false. Quantum computers do not work by brute-forcing all solutions in parallel, and they are not believed to be able to solve NP-complete problems in polynomial time. See e.g. here.)

Just a update of quantum computing industry base on Greg Kuperberg's answer:
D-Wave 2 System is using quantum annealing.
The superposition quantum states will collapse to a unique state when a observation happened. The current technologies of quantum annealing is apply a physical force to 2 quantum bits, the force adds constrains to qubits so when observation happened, the qubit will have higher probability to collapse to a result that we are willing to see.
Reference:
How does a quantum machine work

I monitor recent non-peer reviewed articles on the subject, this is what I extrapolate from what I have read. a qubit, in addition to what has been said above. namely that they can hold values in superposition, they can also hold multiple bits, for example spin up/+ spin down/+ spin -/vertical , I need to abbreviate +H,-H,+V,-V Left+, LH,LV also not all of the combinations are valid and there are additional values that can be placed on the type of qubit
each used similar to ram vs rom etc. photon with a wavelength, electron with a charge, photon with a charge, photon with a spin, you get the idea, some combinations are not valid and some require additional algorithms in order to pass the argument to the next variable(location where data is stored) or qubit(location of superposition of values to be returned, if you will simply because the use of wires is by necessity limited due to size and space. One of the greatest challenges is controlling or removing Q.(quantum) decoherence. This usually means isolating the system from its environment as interactions with the external world cause the system to decohere. November 2011 researchers factorised 143 using 4 qubits. that same year, D-Wave Systems announced the first commercial quantum annealer on the market by the name D-Wave One. The company claims this system uses a 128 qubit processor chipset.May 2013, Google Inc announced that it was launching the Q. AI. Lab, hopefully to boost AI. I really do Hope I didn't waste anyones time with things they already knew. If you learned something please up.
As I can not yet comment, it really depends on what type of qubit you are working with to know the number of states for example the UNSW silicon Q. bit" vs a Diamond-neutron-valency or a SSD NMR Phosphorus - silicon vs Liquid NMR of the same.

How to estimate FPGA utilization for designing a work a like core?

I was considering some older generation FPGA's to interface with a legacy system. So I want a good way of estimating how much space is necessary to replace an ASIC given its transistor count.
Does Verilog versus VHDL affect the utilization? (According to one of our contractors it affects the timing, so utilization seems likely.)
What effect do different vendor's parts have on it? (Actel's architecture is significantly different from Xilinx', for example. I expect some "weighting" based on this.)

This discussion originally from comp.arch.fpga seems to indicate that it's pretty complicated, including factors such as what space vs. speed tradeoffs you've asked the VHDL (or verilog) compiler to make, etc. When you consider that VHDL is source code and an FPGA implementation of it is object code, you'll see why it's not straightforward.
"FPGA vs. ASIC" notes that "a design created to work well on an FPGA is usually horrible on an ASIC and a design created for an ASIC may not work at all on an FPGA (certainly at the original frequency)".
A Google search for FPGA ASIC gates may have more useful info.

Verilog vs. VHDL has little real difference on speed or utilization. It is more related to amount of code you have to type (more for VHDL) and strong vs weak-typing.
The marketing gates for FPGA vendors are inflated. Altera vs. Xilinx are similar utilization. Look at memories (if memory intensive) and number of flip-flops; that will likely be good enough.
Consider what a similar core requires, for example if you need to do an error-coding core, look at a Reed-Solomon core.

Feasibility of GPU as a CPU? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
What do you think the future of GPU as a CPU initiatives like CUDA are? Do you think they are going to become mainstream and be the next adopted fad in the industry? Apple is building a new framework for using the GPU to do CPU tasks and there has been alot of success in the Nvidias CUDA project in the sciences. Would you suggest that a student commit time into this field?

Commit time if you are interested in scientific and parallel computing. Don't think of CUDA and making a GPU appear as a CPU. It only allows a more direct method of programming GPUs than older GPGPU programming techniques.
General purpose CPUs derive their ability to work well on a wide variety of tasks from all the work that has gone into branch prediction, pipelining, superscaler, etc. This makes it possible for them to achieve good performance on a wide variety of workloads, while making them suck at high-throughput memory intensive floating point operations.
GPUs were originally designed to do one thing, and do it very, very well. Graphics operations are inherently parallel. You can calculate the colour of all pixels on the screen at the same time, because there are no data dependencies between the results. Additionally, the algorithms needed did not have to deal with branches, since nearly any branch that would be required could be achieved by setting a co-efficient to zero or one. The hardware could therefore be very simple. It is not necessary to worry about branch prediction, and instead of making a processor superscaler, you can simply add as many ALU's as you can cram on the chip.
With programmable texture and vertex shaders, GPU's gained a path to general programmability, but they are still limited by the hardware, which is still designed for high throughput floating point operations. Some additional circuitry will probably be added to enable more general purpose computation, but only up to a point. Anything that compromises the ability of a GPU to do graphics won't make it in. After all, GPU companies are still in the graphics business and the target market is still gamers and people who need high end visualization.
The GPGPU market is still a drop in the bucket, and to a certain extent will remain so. After all, "it looks pretty" is a much lower standard to meet than "100% guaranteed and reproducible results, every time."
So in short, GPU's will never be feasible as CPU's. They are simply designed for different kinds of workloads. I expect GPU's will gain features that make them useful for quickly solving a wider variety of problems, but they will always be graphics processing units first and foremost.
It will always be important to always match the problem you have with the most appropriate tool you have to solve it.

Long-term I think that the GPU will cease to exist, as general purpose processors evolve to take over those functions. Intel's Larrabee is the first step. History has shown that betting against x86 is a bad idea.
Study of massively parallel architectures and vector processing will still be useful.

First of all I don't think this questions really belongs on SO.
In my opinion the GPU is a very interesting alternative whenever you do vector-based float mathematics. However this translates to: It will not become mainstream. Most mainstream (Desktop) applications do very few floating-point calculations.
It has already gained traction in games (physics-engines) and in scientific calculations. If you consider any of those two as "mainstream", than yes, the GPU will become mainstream.
I would not consider these two as mainstream and I therefore think, the GPU will raise to be the next adopted fad in the mainstream industry.
If you, as a student have any interest in heavily physics based scientific calculations, you should absolutely commit some time to it (GPUs are very interesting pieces of hardware anyway).

GPU's will never supplant CPU's. A CPU executes a set of sequential instructions, and a GPU does a very specific type of calculation in parallel. These GPU's have great utility in numerical computing and graphics; however, most programs can in no way utilize this flavor of computing.
You will soon begin seeing new processers from Intel and AMD that include GPU-esque floating point vector computations as well as standard CPU computations.

I think it's the right way to go.
Considering that GPUs have been tapped to create cheap supercomputers, it appears to be the natural evolution of things. With so much computing power and R&D already done for you, why not exploit the available technology?
So go ahead and do it. It will make for some cool research, as well as a legit reason to buy that high-end graphic card so you can play Crysis and Assassin's Creed on full graphic detail ;)

Its one of those things that you see 1 or 2 applications for, but soon enough someone will come up with a 'killer app' that figures out how to do something more generally useful with it, at superfast speeds.
Pixel shaders to apply routines to large arrays of float values, maybe we'll see some GIS coverage applications or well, I don't know. If you don't devote more time to it than I have then you'll have the same level of insight as me - ie little!
I have a feeling it could be a really big thing, as do Intel and S3, maybe it just needs 1 little tweak adding to the hardware, or someone with a lightbulb above their head.

With so much untapped power I cannot see how it would go unused for too long. The question is, though, how the GPU will be used for this. CUDA seems to be a good guess for now but other techologies are emerging on the horizon which might make it more approachable by the average developer.
Apple have recently announced OpenCL which they claim is much more than CUDA, yet quite simple. I'm not sure what exactly to make of that but the khronos group (The guys working on the OpenGL standard) are working on the OpenCL standard, and is trying to make it highly interoperable with OpenGL. This might lead to a technology which is better suited for normal software development.
It's an interesting subject and, incidentally, I'm about to start my master thesis on the subject of how best to make the GPU power available to the average developers (if possible) with CUDA as the main focus.

A long time ago, it was really hard to do floating point calculations (thousands/millions of cycles of emulation per instruction on terribly performing (by today's standards) CPUs like the 80386). People that needed floating point performance could get an FPU (for example, the 80387. The old FPU were fairly tightly integrated into the CPU's operation, but they were external. Later on they became integrated, with the 80486 having an FPU built-in.
The old-time FPU is analagous to GPU computation. We can already get it with AMD's APUs. An APU is a CPU with a GPU built into it.
So, I think the actual answer to your question is, GPU's won't become CPUs, instead CPU's will have a GPU built in.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008