Why do .pdn files for paint.net contain a bunch of gibberish? - paint.net

I was using paint.net (An image program) and I decided to open a .pdn file as raw text because I was curious, what I saw was a bunch of gibberish! Why is the data stored like this?

It is most likely stored as binary. This wont make sense to view as a human. However this makes it quick and easy for the program to understand. It also most likely reduces the amount of space the file takes up. Most programs store data like this.

Related

searching in html/txt without loading it into program [duplicate]

I have a FindFile routine in my program which will list files, but if the "Containing Text" field is filled in, then it should only list files containing that text.
If the "Containing Text" field is entered, then I search each file found for the text. My current method of doing that is:
var
FileContents: TStringlist;
begin
FileContents.LoadFromFile(Filepath);
if Pos(TextToFind, FileContents.Text) = 0 then
Found := false
else
Found := true;
The above code is simple, and it generally works okay. But it has two problems:
It fails for very large files (e.g. 300 MB)
I feel it could be faster. It isn't bad, but why wait 10 minutes searching through 1000 files, if there might be a simple way to speed it up a bit?
I need this to work for Delphi 2009 and to search text files that may or may not be Unicode. It only needs to work for text files.
So how can I speed this search up and also make it work for very large files?
Bonus: I would also want to allow an "ignore case" option. That's a tougher one to make efficient. Any ideas?
Solution:
Well, mghie pointed out my earlier question How Can I Efficiently Read The First Few Lines of Many Files in Delphi, and as I answered, it was different and didn't provide the solution.
But he got me thinking that I had done this before and I had. I built a block reading routine for large files that breaks it into 32 MB blocks. I use that to read the input file of my program which can be huge. The routine works fine and fast. So step one is to do the same for these files I am looking through.
So now the question was how to efficiently search within those blocks. Well I did have a previous question on that topic: Is There An Efficient Whole Word Search Function in Delphi? and RRUZ pointed out the SearchBuf routine to me.
That solves the "bonus" as well, because SearchBuf has options which include Whole Word Search (the answer to that question) and MatchCase/noMatchCase (the answer to the bonus).
So I'm off and running. Thanks once again SO community.
The best approach here is probably to use memory mapped files.
First you need a file handle, use the CreateFile windows API function for that.
Then pass that to CreateFileMapping to get a file mapping handle. Finally use MapViewOfFile to map the file into memory.
To handle large files, MapViewOfFile is able to map only a certain range into memory, so you can e.g. map the first 32MB, then use UnmapViewOfFile to unmap it followed by a MapViewOfFile for the next 32MB and so on. (EDIT: as was pointed out below, make sure that the blocks you map this way overlap by a multiple of 4kb, and at least as much as the length of the text you are searching for, so that you are not overlooking any text which might be split at the block boundary)
To do the actual searching once the (part of) the file is mapped into memory, you can make a copy of the source for StrPosLen from SysUtils.pas (it's unfortunately defined in the implementation section only and not exposed in the interface). Leave one copy as is and make another copy, replacing Wide with Ansi every time. Also, if you want to be able to search in binary files which might contain embedded #0's, you can remove the (Str1[I] <> #0) and part.
Either find a way to identify if a file is ANSI or Unicode, or simply call both the Ansi and Unicode version on each mapped part of the file.
Once you are done with each file, make sure to call CloseHandle first on the file mapping handle and then on the file handling. (And don't forget to call UnmapViewOfFile first).
EDIT:
A big advantage of using memory mapped files instead of using e.g. a TFileStream to read the file into memory in blocks is that the bytes will only end up in memory once.
Normally, on file access, first Windows reads the bytes into the OS file cache. Then copies them from there into the application memory.
If you use memory mapped files, the OS can directly map the physical pages from the OS file cache into the address space of the application without making another copy (reducing the time needed for making the copy and halfing memory usage).
Bonus Answer: By calling StrLIComp instead of StrLComp you can do a case insensitive search.
If you are looking for text string searches, look for the Boyer-Moore search algorithm. It uses memory mapped files and a really fast search engine. The is some delphi units around that contain implementations of this algorithm.
To give you an idea of the speed - i currently search through 10-20MB files and it takes in the order of milliseconds.
Oh just read that it might be unicode - not sure if it supports that - but definately look down this path.
This is a problem connected with your previous question How Can I Efficiently Read The First Few Lines of Many Files in Delphi, and the same answers apply. If you don't read the files completely but in blocks then large files won't pose a problem. There's also a big speed-up to be had for files containing the text, in that you should cancel the search upon the first match. Currently you read the whole files even when the text to be found is in the first few lines.
May I suggest a component ? If yes I would recommend ATStreamSearch.
It handles ANSI and UNICODE (and even EBCDIC and Korean and more).
Or the class TUTBMSearch from the JclUnicode (Jedi-jcl). It was mainly written by Mike Lischke (VirtualTreeview). It uses a tuned Boyer-Moore algo that ensure speed. The bad point in your case, is that is fully works in unicode (widestrings) so the trans-typing from String to Widestring risk to be penalizing.
It depends on what kind of data yre you going to search with it, in order for you to achieve a real efficient results you will need to let your programm parse the interesting directories including all files in there, and keep the data in a database which you can access each time for a specific word in a specific list of files which can be generated up to the searching path. A Database statement can provide you results in milliseconds.
The Issue is that you will have to let it run and parse all files after the installation, which may take even more than 1 hour up to the amount of data you wish to parse.
This Database should be updated eachtime your programm starts, this can be done by comparing the MD5-Value of each file if it was changed, so you dont have to parse all your files each time.
If this way of working can be interesting if you have all your data in a constant place and you analyse data in the same files more than each time totally new files, some code analyser work like this and they are real efficient. So you invest some time on parsing and saving intresting data and you can jump to the exact place where a searching word appears and provide a list of all places it appears on in a very short time.
If the files are to be searched multiple times, it could be a good idea to use a word index.
This is called "Full Text Search".
It will be slower the first time (text must be parsed and indexes must be created), but any future search will be immediate: in short, it will use only the indexes, and not read all text again.
You have the exact parser you need in The Delphi Magazine Issue 78, February 2002:
"Algorithms Alfresco: Ask A Thousand Times
Julian Bucknall discusses word indexing and document searches: if you want to know how Google works its magic this is the page to turn to."
There are several FTS implementation for Delphi:
Rubicon
Mutis
ColiGet
Google is your friend..
I'd like to add that most DB have an embedded FTS engine. SQLite3 even has a very small but efficient implementation, with page ranking and such.
We provide direct access from Delphi, with ORM classes, to this Full Text Search engine, named FTS3/FTS4.

.json to .csv "big" file

I recently downloaded my location history from Google. From 2014 to present.
The resulting .json file was 997,000 lines, plus a few.
All of the online converters would freeze and lock up unless I did it in really small slices which isn't an option. (Time constraints)
I've gotten a manual process down between Sublime Text and Libre Office to get my information transferred, but I know there's an easier way somewhere.
I even tried the fastFedora plug-in which I couldn't get to work.
Even though I'm halfway done, and will likely finish up using my process, is there an easier way?
I can play with Java though I'm no pro. Any other languages that play well with .json?
A solution that supports nesting without flattening the file. Location data is nested and needs to remain nested (or the like) to make sense. At least grouped.

embed data or not what are best practices for serving/parsing dynamic content

I am starting to use go for serving dynamic html content, parsing templates, replace variables, etc. so far all good, I found that I could create a single binary and deploy a single file including all the static files by using packages like go-bindata.
But when it comes to performance what are the best practices to follow?
If I am right, having a single binary with all the static content embedded will result in a bigger file in size.
Having a binary that needs/depends to parse the templates (*.tpl) only at at startup maybe smaller in size, but will need to be shipped with all the static content.
If space is the only difference, having a single binary looks like the more comfortable way to go for some cases, but not been an expert on the topic, I would like to know some best practices to follow keeping an eye on performance.
I you add something like
var templates = template.Must(template.ParseGlob("templates/*.html"))
in global scope, then they are parsed only on startup.
If you upload and run your app on some server, then probably having separate files is more convenient because then you can use rsync to avoid uploading file which didn't change since last upload.
Putting everything to one file can make things easier if you want to distribute only one executable for download.

Better way to bypass Haxe 16MB embedding file size limit?

I just noticed that Haxe (openFL) limits a single embed file's size to 16MB when using the #:file tag(or openfl.Assets). Flash/Flex can embed much larger files directly. My way to solve the problem is to split one large file to several smaller files and combine them at run time. But this is not so convenient sometimes. So is there any better way to bypass this limit in Haxe?
First of all, embedding files of this big size is generally not a good idea:
binary size is huge
is slows down compilation(because you need to copypaste quite a big amount of data every time
this data is forced to be stored in RAM/swap while application is running
But, speaking of solving the exact problem... I'm not exactly sure if swf allows to embed this big chunks of data at all(need to look at bytecode specs), but in any case it seems that the limitation is there because of ocaml inner limitation on string size. It can be fixed, I believe, however you need to rewrite part of haxe swf generator.
If you don't want to fix compiler(which may not be possible in case swf doesn't allow to embed this big chunks), then you may just go with a simple macro, which will transparently slice your file in parts, embed and reassemble in runtime.

Decipher binary format of file

I have a binary file to which I'm trying to write however I dont have the file format specification nor have found it using google, I've been looking at the file using a hex editor but so far has only give me a headache, is there a better way to decipher the format of the file so that I can append data to it?
File carving tools such as scalpel won't really help here. They're made for extracting files with known header and/or footer signatures from a memory dump or some larger, composite file.
For your scenario, I would recommend a hex editor with templating capability, like the 010 Editor. This will allow you to name and annotate "fields" in the binary as you learn more about what each part of the file does. Unfortunately, the process of finding out what each field does is mostly manual. As a methodology, just start playing with it. Change some values in your current binary and see what happens. Expect to spend significant time on it, but also enjoy the process!
you may want to search it with a open source forensic application like foremost or scalpel. They will do most of the grunt work for you, you just likely wont learn anything.