Where do I get "junk" data to help test my code? - language-agnostic

For my C class I've written a simple statistics program -- it calculates max, min, mean, etc. Anyway, I've gotten the program successfully compiled, so all I need to do now is actually test it; the only problem is that I don't have anything to test with.
In my case, I need a list of doubles -- my program needs to accept between 2 and 1,000,000; Is there some resource online that can produce lists of otherwise meaningless data? I know Lorem Ipsum gets used for typesetting, and I'm wondering if there's something similar for various types of numerical data.
Or am I out of luck, and I'll have to just create my own junk data?

The problem with testing software is not the source of the data, but the test set. I mean, can you test an int sum(int a, int b) method by just inputting random numbers to it? No, you need to know what to expect. This is a test set: inputs and expected outputs.
What do you say when you discover that 548888876+99814465=643503341? How can you tell this is the real result?
More than finding random numbers to give your program, you must somehow know the results of your computation in advance in order to compare it.
There are a few ways to do it: what I suggest you is to pick a random number generator (amphetamachine +1) and use the data both on your code and on a program that you already know is good, ie. Matlab for your purposes. After computing your statistics with both, compare your results and see if you coded good or need to do some debug.
By the way, I volountarily altered the result of the above sum...

What about just generating a random double?
Random r = new Random();
for (int i = 0; i < 100000; i++)
{
double number = r.NextDouble();
//do something with the value
}

Since the data you need will depend on the program, there is no source of generic data that I know of.
If you are able to write that program, you should be able to write a script to generate dummy data for yourself.
Just use a loop to print out random numbers within the range your program can accept.

Generate a file with random bytes:
$ dd \
of=random-bytes \
if=/dev/urandom \
bs=1024 \
count=1024

http://www.generatedata.com/#generator
I've used that data generator before with some success. To be fair, it will usually involve copy/pasting the data it generates into some other format that you'll be able to read in.
You can generate your own data for this specific case quite easily though. Loop a random number of times with a terminating condition of 1,000,000. Generating random doubles within the range you expect. Feed that in and away you go.
Generating your own test data in this case is probably the best option.

You could take the first million digits of pi and chop them up into however many doubles you want.
The first few could be 3.14159, 2.65358, 9.79323, 8.46264, 3.38327, 9.50288, 4.19716, and 9.39937, for example.

Related

Should I define a new function for making a matrix Hermitian?

Let X be a square matrix. We want to force it to be Hermitian, that is: self-conjugate-transpose. X = X^H = conj(X^T). To do this in Python with numpy is easy:
X = 0.5*(X + np.conj(X.T))
I haven't found in NumPy a single function that does it in a single experssion f(x).
The question is should I define a new function to do it? E.g.
def make_hermitian(X):
return 0.5*(X + np.conj(X.T))
(one can come up with short name, e.g. "make_h" or "herm" or "selfconj").
Pros: more readable code, one operation in shorter form. If one uses shorter name it saves writing when repeated many times, and makes modification in this operation far more easy and comfortable (need to change only in place).
Cons: replaces a very short and straight-forward expression which is self-evident.
What is more appropriate way of programming: define a new function or just write the explicit expression repeatedly?
I would say it depends on how many times you need to reuse that function.
If it's more than twice, then definitely make a function. If it's only once or twice, I would say it's up to you. If you choose to go with no function, add a short comment specifying what such piece of code is supposed to do.
My preference in any case would be defining a function with a meaningful name, because if anyone else is going to / supposed to read the code, they may not know or remember how to achieve a Hermitian matrix, and hence the math alone ain't going to be sufficient.
On the other hand, a meaningful function name will tell them clearly what it's going on, and they can google after what a Hermitian matrix is.

Pascal results window

I have written a programme which merges two 1D arrays containing names. I print the list of arr1, arr2 and arr3.
I am using Lazarus Free Pascal v. 1.0.14 . I was wondering if anyone knows how to break the results in the dos-like window because the list is so long that I can only see the last few names in the returned results. The rest go by too fast to read.
I know I can save the resuls to file and I also use the delay command, but would like to know if there is a way to somehow break the results or slow them down or even edit the output console?
I appreciate your help.
This isn't really a programming question, because your console application should output the values without pause. Otherwise your program would become useless if you ever wanted it to run as part of another pipeline in an automated fashion.
Instead you need a tool that you wrap around your program to paginate the output if, and when, you so desire. Such tools are known as terminal pagers and the basic one that ships with Windows is called more. You execute your program and pipe the output to the more program. Like this:
C:\SomeDir>MyProject.exe <input_args> | more
You can change the code of your loop in the following way:
say you print the results by the followng loop:
for i:=0 to 250 do
WriteLn(ArrUnited[i]);
you can replace it with:
for i:=0 to 250 do
begin
WriteLn(ArrUnited[i]);
if (i mod 25) = 24 then //the code will wait for the user pressing Enter every 25 rows
ReadLn;
end;
For the future please! post MCVE in your questions otherwise everyone has to guess what your code is.

Shows warning during compiling

enter image description here
it show warning
I was suppose to arrange the numbers, In order irrespective of the values, but to move 0 come at last.
To learn what you are doing wrong you need to read a book in C. Basic one.
I can point some errors and good practices.
comparison is done by ==. so you should use if(i==0)
After the second for loop you would want to change the value of i to 0. i=0.
the two for loops should be run upto the point when i<n and j<n.
That if(a[i]==0) comparison is not needed.
You don't need the while loop here.
You can print all of them after the for-for looping.
Global variables are used but you should have a good reason to use that.
Index variables are better if declared and defined locally.
what you are trying to do is known as Bubble sort.
Even after understanding all this and following this you get error try to run it through debugger, try small value of n.
Then if you can't, then ask here.

Generating truth tables for basic logic circuits

Let's say I have a text file that looks like this:
<number> <name> <type> <inputs...>
1 XOR1 XOR A B
2 SUM XOR 1 C
What would be the best approach to generate the truth table for this circuit?
That depends on what you have available, and how big your file is.
Perl is optimized for reading files and generating simple text output. It doesn't have a library of boolean operators, but they're easy enough to write. I'd use that if I just wanted text-in, text-out.
If I wanted to display the data online AND generate a results file, I'd use PHP to read the data and write the table to a CSV file that could either be opened in Excel, or posted online in an HTML table.
If your data is in a REALLY BIG data file, I'd use SQL.
If your data is in a really huge file that you want to be accessible to authorized users online, and you want THEM to be able to create truth tables, I'd use Oracle's APEX to create an easy interface for them to build their own truth tables and play around with the data without altering it.
If you're in an electrical engineering environment, use the tools designed for your problem -- Verilog or similar.
Whatcha got? Whatcha wanna do with it?
-- Ada
I prefer using C#. I already have the code to 'parse' the input
text file. I just don't know where to start in terms of
actually 'simulating' it. The output can simply be a text file
with inputs and output values – Don 12 mins ago
How many inputs and how many outputs in the circuit you want to simulate?
The size of the simulation determines how it can most easily be run. If the circuit is small(ish), you can enter the inputs and circuit values into vector arrays, then cross them to get the output matrix.
Matlab is ideal for this, as it was written for processing arrays.
Again: Whatcha got, and whatcha wanna do with it?
-- Ada

MUMPS can't format Number to String

I am trying to convert larg number to string in MUMPS but I can't.
Let me explain what I would like to do :
s A="TEST_STRING#12168013110012340000000001"
s B=$P(A,"#",2)
s TAB(B)=1
s TAB(B)=1
I would like create an array TAB where variable B will be a primary key for array TAB.
When I do ZWR I will get
A="TEST_STRING#12168013110012340000000001"
B="12168013110012340000000001"
TAB(12168013110012340000000000)=1
TAB("12168013110012340000000001")=1
as you can see first SET recognize variable B as a number (wrongly converted) and second SET recognize variable B as a string ( as I would like to see ).
My question is how to write SET command to recognize variable B as a string instead of number ( which is wrong in my opinion ).
Any advice/explanation will be helpful.
This may be a limitation of sorting/storage mechanism built into MUMPS and is different between different MUMPS implementations. The cause is that while variable values in MUMPS are non typed, index values are -- and numeric indices are sorted before string ones. When converting a large string to number, rounding errors may occur. To prevent this from happening, you need to add a space before number in your index to explicitly treat it as string:
s TAB(" "_B)=1
As far as I know, Intersystems Cache doesn't have this limitation -- at least your code works fine in Cache and in documentation they claim to support up to 309 digits:
http://docs.intersystems.com/cache20141/csp/docbook/DocBook.UI.Page.cls?KEY=GGBL_structure#GGBL_C12648
I've tried to recreate your scenario, but I am not seeing the issue you're experiencing.
It actually is not possible ( in my opinion ) for the same command executed immediately ( one execution after another) to produce two different results.
s TAB(B)=1
s TAB(B)=1
for as long the value of B did not change between the executions, the result should be:
TAB("12168013110012340000000001")=1
Example of what GT.M implementation of MUMPS returns in your case