How to open a heavy html file - html

I have recently downloaded my facebook archive, which is a very old account I started in 2009.
There is some conversations I would like to read, the main problem is that messages.html inside the zip weights 98 mo.
Unfortunately,neither mozilla or google chrome can open those 21109 lines of codes in a webview without crashing.
I could open the document with Notepad++, but it's just like searching for a needle in a haystack.
Could you help me please ?

Further to the LINUX comments, we can only assume you are trying to look (or search) inside the html file. You can use any good, text editor like: TextPad, EditPad, etc. You can also download "Unxutils" (not it is not mis-spelled) and use the Windows ports of grep/sed/awk/head/tail/cut etc. There maybe comments or answers posted to use Cygwin which work fine, but require the use of DLL libraries and such. The UnxUtils are stand-alone exe files are work right out of the box with no installation required.

If you are interested in getting some readable files for each conversation you can use the first part of this tutorial which generates csv files which are easily searchable.
http://openmachin.es/blog/facebook-messages

Related

Download HTML as DOCX readable by Mac Pages

I'm developing an application in Next.js with TypeScript, in which I want to let the user download a page as DOCX.
I was glad to find that it was possible to easily achieve this, using this method. The downloaded file can be opened by Microsoft Word, Google Documents and Libre Office, but when I try to open it with Pages on Mac, I get prompted that the file has an invalid format.
I guess this makes sense, given how the html headers seem pretty Microsoft oriented: "<html xmlns:o='urn:schemas-microsoft-com:office:office' xmlns:w='urn:schemas-microsoft-com:office:word' ... >. However, I am really struggling with finding a way to download a DOCX which can be opened by Mac Pages.
There are third-party options like html-to-docx which might be able to handle this, but I'm having trouble finding any such (maintained) package with type declarations for TypeScript.
I ended up using html-to-docx anyway, ignoring the lack of type declarations for now, by #ts-ignore:ing the import. It seems like the package has type definitions on its way though.
The package manages to download the given HTML page as DOCX readable on all platforms I've tested this far (Word, Google Documents, Libre Office and Mac Pages).

How do I extract my comprehensive browser history from local browser profile files programatically?

I setup an isolated Chrome browser profile by using a command line switch in a Windows shortcut:
%CHROME_EXE% --user-data-dir=g:\profiles\chrome\<user_dir>\
If this directory is empty, when I launch Chrome with this shortcut, all profile files will be created.
My browser history is in there somewhere.
I know where my profile directory is and I can list/see all files.
I want to write a program (Java, Python or Rust), to extract my browser history data.
I want to extract the URL and date of last visit.
How would this be possible? Is there an example project with example code that would do this?
Is there any information that can be obtained about the profile files as to which file to look at (if it's a database file or a plain text file), and how to extract data?
I have looked at other similar Stackoverflow posts, and none fully answers this question (that is recent and pertains to the latest browser release).
Thank You,

Can we use wxWidget to save a webpage onto the disk

I was using wget with system call (c++) to save a webpage from internet to my HD in a program. Now I want to use wxWidget to do the same. Is there anyway I can do that and still have the generic behaviour of wget? (i.e. i give a link to a pdf file n then a pdf file is saved)
I found this link http://wiki.wxwidgets.org/Download_a_file_from_internet
but I have no idea how to convert wxString to a pdf/mp3 file according to the url entered.
Could anyone help please. I am working on an open source project for the first time and I encountered this wxWidget just now
If you are happy using wget with a system call, then why not continue to do so?
wxWidgets is a GUI framework, with a lot of extra convenience functions included. You don't HAVE to use them. You can still use whatever C++ features, utilities and packages that your are familiar with.
Here is a link to Wget for Windows
You can use wxHTTP (as described here) or use wxURL and GetInputStream()

Capitalized "JPG" extension different from lowercase "jpg"?

I am working on a personal webpage, a draft version of which I have uploaded to www.kurtpeek.com. One of the bugs I notice, is that one of my jpeg images in the "About Me" section, "MIT_IAP_SAR_smallest.JPG", does not show up on the web, but I can see it just fine if I look at it on my computer. (You can download the directory from http://dl.dropbox.com/u/1396332/kurtpeek.com.rar ).
The only difference I can see with the other pictures is that this one has a capitalized ".JPG" extension instead of a lowercase ".jpg" one. However, if I try to rename the file to a lowercase extension Windows 7 just capitalizes it again.
Any help on this issue would be much appreciated.
Best regards,
Kurt Peek
P.S. This problem seems similar to this one:
php file upload capitalized filename issue
but as far as I can tell that problem was not resolved.
It is very likely that your server is running under a Linux distribution, whose file system is usually case-sensitive.
http://www.kurtpeek.com/img/MIT_IAP_SAR_smallest.jpg - Not found
http://www.kurtpeek.com/img/MIT_IAP_SAR_smallest.JPG - Found

associated folder when saving an html file

When you save an html file "completely"(as opposite to "just html"), the html file and an associated resource folder will be saved. The interesting thing is that when you delete the html file, the folder will be deleted automatically.
The folder must not be some ordinary folder. What is it called and is it possible to do this programmatically?
I googled a bit but cannot find the answer because I don't even know what keywords should I type as I haev no idea what this is called.
I believe that the Microsoft term for this special folder is a 'Connected File', and I think it was introduced in Windows 2000 — in other words, it's a Windows/Explorer feature rather than an Internet Explorer feature. I haven't seen much about it, but this MSDN document could be a good starting point:
http://msdn.microsoft.com/en-us/library/bb776887(VS.85).aspx#connected
Are you deleting this from Windows Explorer? I think it is a feature of that program, and not operating system. Try to delete it with any other file manager: FAR, Total Commander, etc