Converting data stored in Fortran 90 binaries to human readable format - binary

In your experience, in Fortran 90, what is the best way to store large arrays in output files? Previously, I had been trying to write large arrays to ASCII text files. For example, I would do something like this (thanks to the recommendation at the bottom of the page In Fortran 90, what is a good way to write an array to a text file, row-wise?):
PROGRAM testing1
IMPLICIT NONE
INTEGER :: i, j, k
INTEGER, DIMENSION(4,10) :: a
k=1
DO i=1,4
DO j=1,10
a(i,j)=k
k=k+1
END DO
END DO
OPEN(UNIT=12, FILE="output.txt", ACTION="WRITE", STATUS="REPLACE")
DO i=1,4
DO j=1,10
WRITE(12, "(i2,x)", ADVANCE="NO") a(i,j)
END DO
WRITE(12, *)
END DO
CLOSE(UNIT=12)
END PROGRAM testing1
This works, but as pointed out by the topmost reply at In Fortran 90, what is a good way to write an array to a text file, row-wise?, writing large arrays to text files is very slow and creates files that are somewhat larger in size than is necessary. The poster there recommended instead writing to an unformatted Fortran binary, using something like:
PROGRAM testing2
IMPLICIT NONE
INTEGER :: i, j, k
INTEGER, DIMENSION(4,10) :: a
k=1
DO i=1,4
DO j=1,10
a(i,j)=k
k=k+1
END DO
END DO
OPEN(UNIT=13, FILE="output.dat", ACTION="WRITE", STATUS="REPLACE", &
FORM="UNFORMATTED")
WRITE(13) a
CLOSE(UNIT=13)
END PROGRAM testing2
This seems to work, and is indeed much faster and results in smaller file sizes, as promised by the reply here. However, what do I do if I would like to be able to later work with the data stored in Fortran binary (e.g., output.dat above) and analyze its contents? For example, what if I want to open the array stored in the binary in a program such as Microsoft Excel?
When I mentioned matlab in my previous post, the reply suggested that I open the binary as a hexadecimal file and figure out and extract the records from there. But, I am nervous that I am getting into deep water since I have no prior experience in hexadecimal sleuthing. When I asked on the matlab board (here: http://www.mathworks.com/matlabcentral/answers/12639-advice-on-reading-an-unformatted-fortran-binary-file-into-matlab) about reading Fortran files into matlab, the person there suggested that using Fortran stream might be easy. But is Fortran stream (i.e., using the directive ACCESS="STREAM" in the OPEN command) likely to be similar in time and file size to the ASCII text file that I created in my first example above?
Or, do you know if there is any other software that can automatically read Fortran binaries into some sort of human readable form? (Or, do you know of any good tutorials on either hexadecimal sleuthing or Fortran stream?)
Thank you very much for your time.

Stream is a choice independent of the choice of formatted / unformatted -- one is "access", the other "format" The default for Fortran I/O is record oriented access. The typical approach of a Fortran compiler for records (at least unformatted) to write a 4-byte record length before and after each record. (The "after" is to make reading backwards easier.) Using a hex edit you could verify these extra data items that I described and skip them in MatLab. But they are not part of the language standard and are not portable and are certainly not obvious in other languages. If you select stream and unformatted you will just get the raw sequence of bytes corresponding to your data items -- no extra data items to worry about in the other language! In my experience this output tends to be fairly easy to read in other languages (not tried in MatLab). If this is a small & simple project with portability of the files to other computers not an issue, I would probably use this approach (stream & unformatted) rather than a file format specification such as HDF5 or FITS. I'd write the array as write (13) a, as in your final example. Depending on the other language, you might have to transpose the dimensions. If this is a major and long-lived project with portability a concern, then a portable and standard file interface is worth considering.
I don't know whether any of these formats can be read from Excel. More research.... You might have to write a program to read the binary file of whatever format and output a file in a format that Excel understands.

(converting comment into an answer for posterity)
Are you specifically trying to get information into Matlab? If you are, I highly recommend HDF5. This is the portable binary format you have been looking for.
For converting a Fortran binary to HDF5, you're going to have to read in the original Fortran binary and then write out the same data to an HDF5 file. If you have the Fortran source, this should be pretty easy. Allocate your arrays, make sure you read the arrays in the same order as you wrote them and then write out your new shiny HDF5 file.
The HDF5 group has tutorials with examples in C and Fortran. There is likely an example very close to what you're trying to do. When you build HDF5, make sure to manually enable Fortran support. It is disabled by default.

%In MATLAB
fid=fopen('YOUR_FILE.direct','r'); %Fortran Direct ACCESS
frewind(fid);
tbb=ones(367,45203);
for i =1:367
temp=fread(fid,[45203],'single');
tbb(i,:)=temp;
end
fclose(fid)

Related

Generating truth tables for basic logic circuits

Let's say I have a text file that looks like this:
<number> <name> <type> <inputs...>
1 XOR1 XOR A B
2 SUM XOR 1 C
What would be the best approach to generate the truth table for this circuit?
That depends on what you have available, and how big your file is.
Perl is optimized for reading files and generating simple text output. It doesn't have a library of boolean operators, but they're easy enough to write. I'd use that if I just wanted text-in, text-out.
If I wanted to display the data online AND generate a results file, I'd use PHP to read the data and write the table to a CSV file that could either be opened in Excel, or posted online in an HTML table.
If your data is in a REALLY BIG data file, I'd use SQL.
If your data is in a really huge file that you want to be accessible to authorized users online, and you want THEM to be able to create truth tables, I'd use Oracle's APEX to create an easy interface for them to build their own truth tables and play around with the data without altering it.
If you're in an electrical engineering environment, use the tools designed for your problem -- Verilog or similar.
Whatcha got? Whatcha wanna do with it?
-- Ada
I prefer using C#. I already have the code to 'parse' the input
text file. I just don't know where to start in terms of
actually 'simulating' it. The output can simply be a text file
with inputs and output values – Don 12 mins ago
How many inputs and how many outputs in the circuit you want to simulate?
The size of the simulation determines how it can most easily be run. If the circuit is small(ish), you can enter the inputs and circuit values into vector arrays, then cross them to get the output matrix.
Matlab is ideal for this, as it was written for processing arrays.
Again: Whatcha got, and whatcha wanna do with it?
-- Ada

Iterating over a string in Vimscript or Parse a JSON file

So I'm creating a vim script that needs to load and parse a JSON file into a local object graph. I searched and I couldn't find any native way to process a JSON file, and I don't want to add any dependencies to the script. So I wrote my own function to parse the JSON string (gotten from the file), but it's really slow. At the moment, I iterate through each character in the file like so:
let len = strlen(jsonString) - 1
let i = 0
while i < len
let c = strpart(jsonString, i, 1)
let i += 1
" A lot of code to process file....
" Note: I've tried short cutting the process by searching for enclosing double-quotes when I come across the initial double quotes (also taking into account escaping '\' character. It doesn't help
endwhile
I've also tried this method:
for c in split(jsonString, '\zs')
" Do a lot of parsing ....
endfor
For reference, a file with ~29,000 characters takes about 4 seconds to process, which is unacceptable.
Is there a better way to iterate over a string in vim script?
Or better yet, have I missed a native function to parse JSON?
Update:
I asked for no dependencies because I:
Didn't want to deal with them
Genuinely wanted some ideas for best way to do this without someone else's work.
Sometimes I just like to do things manually even though the problem has already been solved.
I'm not against plugins or dependencies at all, it's just that I'm curious. Thus the question.
I ended up creating my own function to parse the JSON file. I was creating a script that could parse the package.json file associated with node.js modules. Because of this, I could rely on a fairly consistent format and quit the processing whenever I'd retrieved the information I needed. This usually cut out large chunks of the file since most developers put the largest chunk of the file, their "readme" section, at the end. Because the package.json file is strictly defined, I left the process somewhat fragile. It assumed a root dictionary { } and actively looks for certain entries. You can find the script here: https://github.com/ahayman/vim-nodejs-complete/blob/master/after/ftplugin/javascript.vim#L33.
Of course, this doesn't answer my own question. It's only the solution to my unique problem. I'll wait a few days for new answers and pick the best one before the bounty ends (already set an alarm on my phone).
The simplest solution with the least dependencies is just using the json_decode vim function.
let dict = json_decode(jsonString)
Even though Vim's origin dates back a lot it happens that its internal string() eval() representation is that close to JSON that its likely to work unless you need special characters.
You can lookup the implementation here which even supports true/false/null if you want:
https://github.com/MarcWeber/vim-addon-json-encoding
Better use that library (vim-addon-manager allows to install dependencies easily).
Now it depends on your data whether this is good enough.
Now Benjamin Klein posted your question to vim_use which is why I'm replying.
Best and fast replies happen if you subscribe to the Vim mailinglist.
Goto vim.sf.net and follow the community link.
You cannot expect the Vim community to scrape stackoverflow.
I've added the keyword "json" and "parsing" to that little code that it can be found easier.
If this solution does not work for you you can try the many :h if_* bindings or write an external script which extracts the information you're looking for, or turns JSON into Vim's dictionary representation which can be read by eval() escaping special characters you care about correctly.
If you seek for completely correct solution omitting dependencies is one of the worst thing you can do. The eval() variant mentioned by #MarcWeber is one of the fastest, but it has its disadvantages:
Using solution for securing eval I mentioned in comment makes it no longer the fastest. In fact after you use this it makes eval() slower by more then an order of magnitude (0.02s vs 0.53s in my test).
It does not respect surrogate pairs.
It cannot be used to verify that you have correct JSON: it accepts some strings (e.g. "\<C-o>") that are not JSON strings and it allows trailing commas.
It fails to give normal error messages. It fails badly if you use vam#VerifyIsJSON I mentioned in p.1.
It fails to load floating point values like 1e10 (vim requires numbers to look like 1.0e10, but numbers like 1e10 are allowed: note “and/or” in the first paragraph).
. All of the above (except for the first) statements also apply to vim-addon-json-encoding mentioned by #MarcWeber because it uses eval. There are some other possibilities:
Fastest and the most correct is using python: pyeval('json.loads(vim.eval("varname"))'). Not faster then eval, but fastest among other possibilities. (0.04 in my test: approximately two times slower then eval())
Note that I use pyeval() here. If you want solution for vim version that lacks this functionality it will no longer be one of the fastest.
Use my json.vim plugin. It has an advantages of slightly better error reporting compared to failed vam#VerifyIsJSON, slightly worse compared to eval() and it correctly loads floating-point numbers. It can be used for verification of strings (it does not accept "\<C-a>"), but it loads lists with trailing comma just fine. It does not support surrogate pairs. It is also very slow: in the test I used (it uses 279702 character long strings) it takes 11.59s to load. Json.vim tries to use python if possible though.
For the best error reporting you can take yaml.vim and purge YAML support out of it leaving only JSON (I once have done the same thing for pyyaml, though in python: see markedjson library used in powerline: it is pyyaml minus YAML stuff plus classes with marks). But this variant is even slower then json.vim and should only be used if the main thing you need is error reporting: 207 seconds for loading the same 279702 character long string.
Note that the only variant mentioned that satisfies both requirements “no dependencies” and “no python” is eval(). If you are not fine with its disadvantages you have to throw away one or both of these requirements. Or copy-paste code. Though if you take speed into account only two candidates are left: eval() and python: if you want to parse json fast you really must use C and only these solutions spend most time in functions written in C.
Most other interpreters (ruby/perl/TCL) do not have pyeval() equivalent so they will be slower even if their JSON implementation is written in C. Some other (lua/racket (mzscheme)) have pyeval() equivalent, but e.g. luaeval('{}') is zero meaning that you will have to add additional step explicitly and recursively converting objects into vim dictionaries and lists (e.g. luaeval('vim.dict({})')) which will impact performance. Cannot say anything about mzeval(), but I have never heard about anybody actually using racket (mzscheme) with vim.

Performance comparison: one argument or a list of arguments?

I am defining a new TCL command whose implementation is C++. The command is to query a data stream and the syntax is something like this:
mycmd <arg1> <arg2> ...
The idea is this command takes a list of arguments and returns a list which has the corresponding data for each argument.
My colleague commented that it is best just to use a single argument and when multi values are needed, just call the command multiple times.
There are some other discussions, but one thing we cannot agree with each other is, the performance.
I think my version, list of argument should be quicker because when we want multi arguments, it is one time cost going through TCL interpreter.
His comment is new to me -
function implementation is cached
accessing TCL function is quicker than accessing TCL data
Is this reasoning sound?
If you use Tcl_EvalObjv to invoke the command, you won't go through the Tcl interpreter. The cost will be one hash-table lookup (or less, if you reuse the Tcl_Obj* containing the command name) and then you'll be in the implementation of the command. Otherwise, constructing a list Tcl_Obj* (e.g., with Tcl_NewListObj) and then calling Tcl_EvalObj is nearly as cheap, as that's a special case because the list construction code is guaranteed to produce lists that are also substitution-free commands.
Building a normal string and passing that through Tcl_Eval (or Tcl_EvalObj) is significantly slower, as that has to be parsed. (OTOH, passing the same Tcl_Obj* through Tcl_EvalObj multiple times in a row will be faster as it will be compiled internally to bytecode.)
Accessing into values (i.e., into Tcl_Obj* references) is pretty fast, provided the internal representation of those values matches the type that the access function requires. If there's a mismatch, an internal type conversion function may be called and they're often relatively expensive. To understand internal representations, here's a few for you to think about:
string — array of unicode characters
integer — a C long (except when you spill over into arbitrary precision work)
list — array of Tcl_Obj* references
dict — hash table that maps Tcl_Obj* to Tcl_Obj*
script — bytecoded version
command — pointer to the implementation function
OK, those aren't the exact types (there's often other bookkeeping data too) but they're what you should think of as the model.
As to “which is fastest”, the only sane way to answer the question is to try it and see which is fastest for real: the answer will depend on too many factors for anyone without the actual code to predict it. If you're calling from Tcl, the time command is perfect for this sort of performance analysis work (it is what it is designed for). If you're calling from C or C++, use that language's performance measurement idioms (which I don't know, but would search Stack Overflow for).
Myself? I advise writing the API to be as clear and clean as possible. Describe the actual operations, and don't distort everything to try to squeeze an extra 0.01% of performance out.

When could a CSV records *not* have the same number of fields?

I am storing a series of events to a CSV file, each event type comes with a different set of data.
To illustrate, say I have two events (there will be many more):
Running, which has a data set containing speed and incline.
Sleeping, which has a data set containing snores.
There are two options to store this data in CSV records:
Option A
Storing each possible item of data in it's own field...
speed, incline, snores
therefore...
15mph, 20%, ,
, , 12
16mph, 20%, ,
14mph, 20%, ,
Option B
Storing each event in its own record...
event, value1...
therefore...
running, 15mph, 20%
sleeping, 12
running, 16mph, 20%
running, 14mph, 20%
Without a specific CSV specification, the consensus seems to be:
Each record "should" contain the same number of comma-separated fields.
Context
There are a number of events which each have a large & different set of data values.
CSV data is to be of use to other developers (I will/could/should/won't use either structure).
The 'other developers' to be toward the novice end of the spectrum and/or using resource limited systems. CSV is accessible.
The CSV format is being provided non-exclusively as feature not requirement. Although, if said application is providing a CSV file it should be provided in the correct manner from now on.
Question
Would it be valid – in this case - to go with Option B?
Thoughts
Option B maintains a level of human readability, which is an advantage say CSV is read by human not processor. Neither method is more complex to parse using a custom parser, but will Option B void the usefulness of a CSV format with other libraries, frameworks, applications et al. With Option A future changes/versions to the data set of an individual event may break the CSV structure (zombie , , to maintain forwards compatibility); whereas Option B will fail gracefully.
edit
This may be aimed at students and frameworks like OpenFrameworks, Plask, Proccessing et al. where CSV is easier to implement.
Any "other frameworks, libraries and applications" I've ever used all handle CSV parsing differently, so trying to conform to one or many of these standards might over-complicate your end result. My recommendation would be to keep it simple and use what works for your specific task. If human readbility is a requirement, then CSV in the form of Option B would work fine. Otherwise, you may want to consider JSON or XML.
As you say there is no "CSV Standard" with regard to contents. The real answer depend on what you are doing and why. You mention "other frameworks, libraries and applications". The one thing I've learnt is "Dont over engineer". i.e. Don't write reams of code today on the assumption that you will plug it into some other framework tomorrow.
I'd say option B is fine, unless you have specific requirements to use other apps etc.
< edit >
Having re-read your context, I'd probably pick one output format and use it, and forget about having multiple formats:
Having multiple output formats is a source of inconsistency (e.g. bug in one format but not another).
Having multiple formats means more code that needs to be
tested
documented
supported
< /edit >
Is there any reason you can't use XML? Yes, it's slightly more difficult to parse, at least for novices, but if so they probably need the practice. File size would be much greater, of course, but it's compressible.

Is there a programming language with no controls structures or operators?

Like Smalltalk or Lisp?
EDIT
Where control structures are like:
Java Python
if( condition ) { if cond:
doSomething doSomething
}
Or
Java Python
while( true ) { while True:
print("Hello"); print "Hello"
}
And operators
Java, Python
1 + 2 // + operator
2 * 5 // * op
In Smalltalk ( if I'm correct ) that would be:
condition ifTrue:[
doSomething
]
True whileTrue:[
"Hello" print
]
1 + 2 // + is a method of 1 and the parameter is 2 like 1.add(2)
2 * 5 // same thing
how come you've never heard of lisp before?
You mean without special syntax for achieving the same?
Lots of languages have control structures and operators that are "really" some form of message passing or functional call system that can be redefined. Most "pure" object languages and pure functional languages fit the bill. But they are all still going to have your "+" and some form of code block--including SmallTalk!--so your question is a little misleading.
Assembly
Befunge
Prolog*
*I cannot be held accountable for any frustration and/or headaches caused by trying to get your head around this technology, nor am I liable for any damages caused by you due to aforementioned conditions including, but not limited to, broken keyboard, punched-in screen and/or head-shaped dents in your desk.
Pure lambda calculus? Here's the grammar for the entire language:
e ::= x | e1 e2 | \x . e
All you have are variables, function application, and function creation. It's equivalent in power to a Turing machine. There are well-known codings (typically "Church encodings") for such constructs as
If-then-else
while-do
recursion
and such datatypes as
Booleans
integers
records
lists, trees, and other recursive types
Coding in lambda calculus can be a lot of fun—our students will do it in the undergraduate languages course next spring.
Forth may qualify, depending on exactly what you mean by "no control structures or operators". Forth may appear to have them, but really they are all just symbols, and the "control structures" and "operators" can be defined (or redefined) by the programmer.
What about Logo or more specifically, Turtle Graphics? I'm sure we all remember that, PEN UP, PEN DOWN, FORWARD 10, etc.
The SMITH programming language:
http://esolangs.org/wiki/SMITH
http://catseye.tc/projects/smith/
It has no jumps and is Turing complete. I've also made a Haskell interpreter for this bad boy a few years back.
I'll be first to mention brain**** then.
In Tcl, there's no control structures; there's just commands and they can all be redefined. Every last one. There's also no operators. Well, except for in expressions, but that's really just an imported foreign syntax that isn't part of the language itself. (We can also import full C or Fortran or just about anything else.)
How about FRACTRAN?
FRACTRAN is a Turing-complete esoteric programming language invented by the mathematician John Conway. A FRACTRAN program is an ordered list of positive fractions together with an initial positive integer input n. The program is run by updating the integer (n) as follows:
for the first fraction f in the list for which nf is an integer, replace n by nf
repeat this rule until no fraction in the list produces an integer when multiplied by n, then halt.
Of course there is an implicit control structure in rule 2.
D (used in DTrace)?
APT - (Automatic Programmed Tool) used extensively for programming NC machine tools.
The language also has no IO capabilities.
XSLT (or XSL, some say) has control structures like if and for, but you should generally avoid them and deal with everything by writing rules with the correct level of specificity. So the control structures are there, but are implied by the default thing the translation engine does: apply potentially-recursive rules.
For and if (and some others) do exist, but in many many situations you can and should work around them.
How about Whenever?
Programs consist of "to-do list" - a series of statements which are executed in random order. Each statement can contain a prerequisite, which if not fulfilled causes the statement to be deferred until some (random) later time.
I'm not entirely clear on the concept, but I think PostScript meets the criteria, although it calls all of its functions operators (the way LISP calls all of its operators functions).
Makefile syntax doesn't seem to have any operators or control structures. I'd say it's a programming language but it isn't Turing Complete (without extensions to the POSIX standard anyway)
So... you're looking for a super-simple language? How about Batch programming? If you have any version of Windows, then you have access to a Batch compiler. It's also more useful than you'd think, since you can carry out basic file functions (copy, rename, make directory, delete file, etc.)
http://www.csulb.edu/~murdock/dosindex.html
Example
Open notepad and make a .Bat file on your Windows box.
Open the .Bat file with notepad
In the first line, type "echo off"
In the second line, type "echo hello world"
In the third line, type "pause"
Save and run the file.
If you're looking for a way to learn some very basic programming, this is a good way to start. (Just be careful with the Delete and Format commands. Don't experiment with those.)