How to generate automatically asn.1 encoded packets? - generator

I want to test my application and I need to generate different load. Application is SUPL RRLP protocol parser, I have ASN.1 specification for this protocol. Packets have a lot of optional fields and number of varians may be over billion - I can't go through all the options manually. I want to automate it.
The first way to generate packets automatically, the other way is to create a lot different value assignments sets and encode each into binary format.
I found some tools, for example libtasn and Asn1Editor, but the first one can't parse existing ASN.1 spec file; the second one can't encode packets by specification.
I'm afraid to create thousandth ASN.1 parser because I can introduce errors in test process.
I hoped it's easy to find something existing, but... I'm capitulating.
Maybe, someone faced with the same problem on stackowerflow and found the solution? Or know something to recommend. I'll thank you.

Please try going to https://asn1.io/asn1playground/ and try your specification there. You can ask it to generate a sample value for a given ASN.1 type. You can encode it and edit either the encoded (hex) data, or decoded values to create additional values.
You can also download a free trial of the OSS ASN.1 Tools from http://www.oss.com/asn1/products/asn1-download.html which includes OSS ASN.1 Studio. This also allows you to generate (and modify) sample values for a given ASN.1 type.
Note that these don't generate thousands of different test values for you automatically, but will parse valid value notation and encode the values for you if you are able to generate valid ASN.1 value notation.

Related

Convert huge linked data dumps (RDF/XML, JSON-LD, TTL) to TSV/CSV

Linked data collections are usually given in RDF/XML, JSON-LD, or TTL format. Relatively large data dumps seem fairly difficult to process. What is a good way to convert an RDF/XML file to a TSV of triplets of linked data?
I've tried OpenRefine, which should handle this, but a 10GB file, (e.g. the person authority information from German National Library) is too difficult to process on a laptop with decent processing power.
Looking for software recommendations or some e.g. Python/R code to convert it. Thanks!
Try these:
Lobid GND API
http://lobid.org/gnd/api
Supports OpenRefine (see blogpost) and a variety of other queries. The data is hosted as JSON-LD (see context) in an elasticsearch cluster. The service offers a rich HTTP-API.
Use a Triple Store
Load the data to a triple store of your choice, e.g. rdf4j. Many triple stores provide some sort of CSV serialization. Together with SPARQL this could be worth a try.
Catmandu
http://librecat.org/Catmandu/
A strong perl based data toolkit that comes with a useful collection of ready-to-use transformation pipelines.
Metafacture
https://github.com/metafacture/metafacture-core/wiki
A Java-Toolkit to design transformation pipelines in Java.
You could use the ontology editor Protege: There, you can SPARQL the data according to your needs and save them as TSV file. It might be important, however, to configure the software beforehand in order to make the amounts of data manageable.
Canonical N-Triples may be already what you are after, as it is essentially a space-separated line-based format for RDF (you cannot naively split at space though, as you need to take care of literals, see below). Of the dataset you cited, many files are available as N-Triples. If not, use a parsing tool like rapper for the conversion to N-Triples, eg.
rapper -i turtle -o ntriples rdf-file-in-turtle-format.ttl > rdf-file-in-ntriples-format.nt
Typically, the n-triples exporters do not exploit all that is allowed in the specification regarding whitespace and use canonical n-triples. Hence, given a line in a canonical n-triples file such as:
<http://example.org/s> <http://example.org/p> "a literal" .
you can get CSV by replacing the first and the second space character of a line with a comma and remove everything after and including the last space character. As literals are the only RDF term where spaces are allowed, and as literals only allowed in object position, this should work for canonical n-triples.
You can get TSV by replacing said space characters with tab. If you also do that for the last space character and do not remove the dot, you have a file that is both a valid n-triples and a TSV file. If you take these positions as split positions, you can work with canonical n-triples files without conversion to CSV/TSV.
Note that you may have to deal with commas/tabs in the RDF terms (eg. by escaping), but that problem exists in any solution for RDF as CSV/TSV.

Decoding parameters in Google webapps

I'm trying to better understand the google web applications, the HTML source has JSON that has been encoded in some unknown way which i would like to decode. For example the below source contains parameter such as DpimGf, EP1ykd which makes no sense
view-source:https://contacts.google.com/
..window.WIZ_global_data = {"DpimGf":false,"EP1ykd":.....
So far i have tried the following
1. Decoded using the base64 decoder, but results are unprintable/not usable.
2. Decoded using poly-line encoding used in Google Maps.
3. Built an app from scratch to perform base64->binary->XOR->ASCII char and to shift the binary values up-to 7 places [inspired by poly-line algorithm]
Is there any documentation from google or known encoding for such formats.
Assumptions : I'm pretty sure that this encoding of some sort and not encryption, because
1) Length of the encrypted text dont match the common encryption algorithms
2) There is some sort of pattern with value of the parameters,
So good chance that its just encoded without any encryption.
Because common encryption provides completely different strings each time.
3) There is a good chance that they may not decode,
because it might have a mapping at server side to a meaningful parameter.
Thanks
try using this http://ddecode.com/hexdecoder/?results=cddb95fa500e7c1af427d005985260a7. try running the whole thing in this it might help

Is there a standard to specify a binary format in json

I would like to know whether there is some standard that specifies binary formats using JSON as the describing language, similar to google's protocol buffers.
Protocol buffers seem very powerful but they require parsing of yet another language and considerable overhead, especially for compiled languages such as C++.
So I am wondering whether there is some accepted standard that uses JSON to describe a binary format. (Parsing the binary data might then still require some manual steps, but at least a clear and unique description of the data can be made available.)
To be clear, I am not talking about encoding binary data in JSON, I am talking about describing binary data in JSON.
Head to the ultimate Wikipedia listing and evaluate for yourself. I don't know what is the right argument to overcome your programmer's inertia. I'd consider Apache Avro the most fitting your requirement - it has JSON description.
For least friction, you could try MessagePack or BSON, which are JSON themselves, just better packed. But, by not having external declaration, need to be self descriptive, so must transport the field names on wire - so it's not as "binary" and compact as Protocol Buffers or Avro.

What is the best file format to parse?

Scenario: I'm working on a rails app that will take data entry in the form of uploaded text-based files. I need to parse these files before importing the data. I can choose the file type uploaded to the app; the software (Microsoft Access) used by those uploading has several export options regarding file type.
While it may be insignificant, I was wondering if there is a specific file type that is most efficiently parsed. This question can be viewed as language-independent, I believe.
(While XML is commonly parsed, it is not a feasible file type for sake of this project.)
If it is something exported by Access, the easiest would be CSV; particularly since Ruby contains a CSV parser in the standard library. You will have to do some work determining the dialect of CSV (what it uses for delimiter, how it handles quotes); I don't know how robust the ruby parser is with those issues, but you also should have some control from Microsoft Access.
You might want to take a look at JSON. It's a lightweight format, and in contrast to XML it's really easy and clean to parse without requiring a huge library on the backend.
It can represent types like strings, numbers, assosiative arrays (objects), and lists of such
I would suggest n-SV (where n is some character) for data that does not include n. That will make lexing the files a matter of a split.
If you have more flexible data, I would suggest JSON.
If you've HAVE to roll your own parser, I would suggest CSV or some form of a delimiter separated format.
If you are able to use other libraries, there are plenty of options. JSON looks quite fascinating.

How to analyze binary file?

I have a binary file. I don't know how it's formatted, I only know it comes from a delphi code.
Does it exist any way to analyze a binary file?
Does it exist any "pattern" to analyze and deserialize the binary content of a file with unknown format?
Try these:
Deserialize data: analyze how it's compiled your exe (try File Analyzer). Try to deserialize the binary data with the language discovered. Then serialize it in a xml format (language-indipendent) that every programming language can understand
Analyze the binary data: try to save various versions of the file with little variation and use a diff program to analyze the meaning of every bit with an hex editor. Use it in conjunction with binary hacking techniques (like How to crack a Binary File Format by Frans Faase)
Reverse Engineer the application: try getting code using reverse engineering tools for the programming language used for build the app (found with File Analyzer). Otherwise use disassembler analysis tool like IDA Pro Disassembler
For my hobby project I had to reverse engineer some old game files. My approaches were:
Have a good hex editor.
Look for readable words in the binary file. Note how their distribution is. If the distance between them is constant you know it is a listing.
Look for 2-3 consequent zeros. Might indicate an int32 value.
Some dwords might be pointers into the file.
Try to identify reoccurring patterns in the file.
Seeing lots of C0-CF might indicate RLE compressed data.
I've developed Hexinator (Window & Linux) and Synalyze It! (macOS) exactly for this purpose. These applications allow you to see the binary files like in other hex editors but additionally you can create a "grammar" with the specifics of a binary file format. The grammar contains all the building blocks and is used to parse the file automatically.
Thus you can keep the knowledge you gain in the analysis and apply it to multiple files simultaneously. You can also color-code the bits and pieces of file formats for a quick overview in the hex editor.
The parsing results are displayed in a tree view where you can also modify the files easily (applying endianness et cetera).
Reverse engineering a binary file when you have some idea of what it represents is a very time consuming process. If you have no idea what it is then it will be even harder.
It is possible though, but you have to have a pretty good reason for doing so.
The first step would be to open it up in a hex editor of your choice and see if you can find any English text to point you in the direction of what the file is even supposed to represent. From there, Google "Reverse Engineering binary files", there are much more knowledgeable people than me that have written guides about it.
The "strings" program from GNU binutils is very useful. It will print the strings of printable characters in a file, quite often giving a clue to what a file contains or a program does.
If the data represents serialized Delphi objects, you should start reading about the Delphi serialization process. If that's the case, I think your best bet would be to load it using Delphi and continue your analysis from the IDE. Some informations about Delphi serialization can be found here.
EDIT: if the file does contain serialized delphi objects, then you should write a small delphi program that loads it, and "convert" the data yourself to something neutral, like xml. If you manage to do this, you should check and see if delphi supports serializing to xml. Then, you could access those objects from any language.
The unix "file" command is really useful - I don't know if there is anything like it in windows. You run it like this:
file myfile.ext
And it spits out a text description based on the magic numbers and data contained therein.
Probably it is contained within cygwin.
If you have access to the application that creates the file, you can apply changes to the application, then save the file and see the effects (Keep in mind that numbers are probably stored in little endian):
First create the file repeatedly. If the files are not binary equal, the current date/time is probably stored in the area where hte differences occur.
Maybe you want to repeat that with the software running under different environments, to see if OS version etc are stored, but this is rather unusual.
Next you can try to change single variables and create several files that only differ in the value of this variable. This helps you identify where this variable is stored.
That way you can also exclude variables that are not stored in the file: If you change them, but the files created are identical, they are not stored.
In order to test the hypotheses you worked out with the steps above, edit one of the files and have the application read it.
If you don't have access to the application itself, I suggest that you forget about it and find another way to solve your problem. There is a very high probability that it will be faster...
If file does not give a meaningful answer, you may want to try TRiD by Marco Pontello to determine whether your data is stored in a known format.
Get the Delphi application and open it in IDA Pro freeware version, and find where it writes the file, and decode how it writes the file that way.
Unless it's plan text.
Do you know the program that uses it? If so you can hook that programs write to file function and get an idea of what data its writing, the size of the data and where.
More Info: http://www.codeproject.com/KB/DLL/Win32APIHooking_Trouble.aspx
Unlike traditional hex editors which only display the raw hex bytes of a file, 010 Editor can also parse a file into a hierarchical structure using a Binary Template. The results of running a Binary Template are much easier to understand and edit than using just the raw hex bytes.
http://www.sweetscape.com/010editor/
Try to open it in a hex editor and analyse.