What is the byte signature of a password-protected ZIP file? - binary

I've read that ZIP files start with the following bytes:
50 4B 03 04
Reference: http://www.garykessler.net/library/file_sigs.html
Question: Is there a certain sequence of bytes that indicate a ZIP file has been password-protected?

It's not true that ZIP files must start with
50 4B 03 04
Entries within zip files start with 50 4B 03 04... ..and often, pure zip files start with a zip entry as the first thing in the file. But, there is no requirement that zip files start with those bytes. All files that start with those bytes are probably zip files, but not all zip files start with those bytes.
For example, you can create a self-extracting archive which is a PE-COFF file, a regular EXE, in which there actually is a signature for the file, which is 4D 5A .... Then, later in the exe file, you can store zip entries, beginning with 50 4B 03 04.... The file is both an .exe and a .zip.
A self-extracting archive is not the only class of zip file that does not start with 50 4B 03 04 . You can "hide" arbitrary data in a zip file this way. WinZip and other tools should have no problems reading a zip file formatted this way.
If you find the 50 4B 03 04 signature within a file, either at the start of the file or somewhere else, you can look at the next few bytes to determine whether that particular entry is encrypted. Normally it looks something like this:
50 4B 03 04 14 00 01 00 08 00 ...
The first four bytes are the entry signature. The next two bytes are the "version needed to extract". In this case it is 0x0014, which is 20. According to the pkware spec, that means version 2.0 of the pkzip spec is required to extract the entry. (The latest zip "feature" used by the entry is described by v2.0 of the spec). You can find higher numbers there if more advanced features are used in the zip file. AES encryption requires v5.1 of the spec, hence you should find 0x0033 in that header. (Not all zip tools respect this).
The next 2 bytes represents the general purpose bit flag (the spec calls it a "bit flag" even though it is a bit field), in this case 0x0001. This has bit 0 set, which indicates that the entry is encrypted.
Other bits in that bit flag have meaning and may also be set. For example bit 6 indicates that strong encryption was used - either AES or some other stronger encryption. Bit 11 says that the entry uses UTF-8 encoding for the filename and the comment.
All this information is available in the PKWare AppNote.txt spec.

It's underlying files within the zip archive that are password-protected. You can have a series of password protected and password unprotected files in an archive (e.g. a readme file and then the contents).

If you followed the links describing ZIP files in the URL you reference, you'd find that this one discusses the bit that indicates whether a file in the ZIP archive is encrypted or not. It seems that each file in the archive can be independently encrypted or not.

Related

Cut a line in multiple parts delimited by patterns, read and re-write .json files

this one's difficult and I haven't found any answers for hours, I hope you can help me. I'm not an english native speaker, I apologize in advance.
I arrived last week in a company and am working with .json files which are all listed in directories, by companies.
e.g.
d/Company1/comport/enregistrement_sessionhash1
enregistrement_sessionhash2
enregistrement_sessionhash3
d/Company2/comport/enregistrement_sessionhashX
d/Company3/comport/enregistrement_sessionhashY...
Each of them can contain [0-n] characters.
We use these files to calculate data.
The person before me didn't think about classifying them by /year/months, therefore it takes a lot of time when we do algorithms on data during a specific month, because it reads all the files inside the directory, which are being stored every 10 seconds per websitecompany and website-user user for approximately 2 years.
Sadly, we can't use systems' creation/modification time, only text informations in the .json files, since there's been a server problem and my coworkers had to paste files, resetting creation time.
Here is a template of the .json files
BEGIN OF FILE
{"session":"session_hash","enregistrements":[{"session":"session_hash",[...]{"data2":"xxx"}],"timedate_saved":"27 04 2020 12:39:21"},{"session":"session_hash",[...],"timedate_saved":"17 06 2020 11:01:08"},{"data1":"session_hash"[...],{"data2":"xxx"}],"timedate_saved":"27 04 2020 18:01:14"}]}
END OF FILE
In a file, there can't be a different "session" value. This value is a hash, used aswell in the filename e.g. d/Company1/comport/enregistrement_session_hash
I would like to read the files, cut every "enregistrements" sub-arrays (starting with [{"session"...and ending with "timedate_saved":"01 01 1970 00:00:00"}]}. Doing this, i want the cutted out text to get written in files having the same filenames (session_hash), stored by company/comport/year/months/enregistrement_sessionhash, gotten by the "timedate_saved" data. And of course be able to reuse these files for further use, so having the .json parsing.
That's a lot, I hope someone has time on his hands to help me getting through it.

Downloaded CSV starts with BACKSPACE and other weird characters

I downloaded a CSV (encoded in UTF-8) from an FTP server (using some VB6 code which has always worked in the past) and found it started with 08 00 50 9e (BACKSPACE NULL P ΕΎ in ASCII).
I've downloaded the same file (a different version) before and never had a problem, so I don't believe the FTP client is at fault here.
Is there some meaning to those characters?
I've tried searching for that string on Google, but (obviously?) did not succeed in the search.
I found the answer... it was an issue the VB6 code: instead of Print #iFileNumber, sFileContents, it used Put #iFileNumber, , sFileContents in Binary mode instead of Output mode (no idea why it worked before, but perhaps I changed something without realising it).
Put adds a four-byte string length indicator, hence 08 00 50 9e.
Problematic code
Open App.Path & "\Temp.csv" For Binary As #iFileNumber
Put #iFileNumber, , StrConv(x.Value, vbUnicode)
Close
Working code
Open App.Path & "\Temp.csv" For Output As #iFileNumber
Print #iFileNumber, StrConv(x.Value, vbUnicode)
Close

MIPS: Append to text file

I need to append a string to a text file. Is there a way to open the file for append in MIPS, (I use Mars simulator) ? And if it it what flag should I use, I presume that should be 4 but it doesn't work and I cannot find a list of available flags for service 13 anywhere ?
I cannot find a list of available flags for service 13 anywhere
They're listed here. According to that page "MARS implements three flag values: 0 for read-only, 1 for write-only with create, and 9 for write-only with create and append. "

IDL: read ascii header of binary file

I'm having an enforced introduction to idl trying to debug some old code.
I have a binary image file that has an ascii header (It's a THEMIS IR BTR image of Mars, if that is of interest). The code opens the file as unit 1 using OPENR, then reads the first 256 bytes of it using ASSOC(1,BYTARR(256)). The return from that is 256 ascii character dex values, but they are mostly high or low numbers that do not correspond to alpha-numeric characters, and are not related to the header that I know is on the file.
One thing that may help with diagnostics: the original file is a g-zipped version of the file. If I try to open it directly (using less, for example) it allows me to read the header. But if I unzip it first (gzip -c filename.IMG.gz > filename.IMG) and then try to read it again I get binary gobbledegook. (less gives me a warning before opening: "filename.IMG may be a binary file. See it anyway?").
Any suggestions?
Here's the IDL code:
CLOSE,1
OPEN,1,FILENAME
A = ASSOC(1,BYTARR(256))
B = A[0]
print,'B - ',B
H = STRING(B)
print,'H - ',H
And this is what it gives me:
B - 31 139 8 8 7 17 238 79 0 3 ... (and on for 256 characters)
H - [Some weird symbol]
I've tried it on a purely ascii test file and it works as expected.
31 139 8 is the beginning of a GZIP header for a "deflated" file.
http://www.gzip.org/zlib/rfc-gzip.html#file-format
So yes, the file looks like it needs to be decompressed first.
Try decompressing the file with gunzip, and check the header again. If it is 31 139 08... again, it looks like it has been compressed twice.
Otherwise, whatever it is, it is likely that it's been finally decompressed. It remains to be seen why the uncompressed file isn't being decoded.
Try the COMPRESS keyword to OPEN:
openr, 1, filename, /compress
The COMPRESS keyword refers to a compressed file, so it is both for reading and writing compressed files.

Line Feeds and Carriage Rerturns in Data: 0D 0A

I am writing a data clean up script (MS Smart Quotes, etc.) that will operate on mySQL tables encoded in Latin1. While scanning the data I noticed a ton of 0D 0A where the line breaks are.
Since I am cleaning the data, should I also address all of the 0D, too, by removing them? Is there ever a good reason to keep 0D (carriage return) anymore?
Thanks!
0D0A (\r\n), and 0A (\n) are line terminators; \r\n is mostly used in OS Windows, \n in unix systems.
Is there ever a good reason to keep 0D anymore?
I think you should answer this question yourself.
You could remove '\r' from the data, but make sure that the programs that will use this data understand that '\n' means the end of line very well. In most cases it is taken into account, but check just in case.
The CR/LF combination is a Windows thing. *NIX operating systems just use LF. So based on the application that uses your data, you'll need to make the decision on whether you want/need to filter out CR's. See the Wikipedia entry on newline for more info.
Python's readline() returns a line followed with a \O12. \O means Octal. 12 is octal for decimal 10. You can see on the ASCII table that Dec 10 is NL or LF. Newline or line feed.
Standard for end-of-line in a unix text or script file.
http://www.asciitable.com/
So be aware that the len() will include the NL unless you try to read past the EOF the len() will never be zero.
Therefore if you INSERT any line of text obtained by the Python readline() into a mysql table it will include the NL character by default, at the end.