How to download a Document from ipaper swf - actionscript-3

Hi guys I am trying to download a document from a swf link in ipaper
Please guide me on how can I download the book
Here is the link to the book which I want to convert to pdf or word and save
http://en-gage.kaplan.co.uk/LMS/content/live_content_v2/acca/exam_kits/2014-15/p6_fa2014/iPaper.swf
Your kind guidance in this regard would be appreciated.
Regards,
Muneeb

first you open the book in your browser with network capturing (in developer/s tools).
you should open many pages at diffrent locations with and without zoom
then look in the captured data.
you will see that for each new page you are opening, the browser asks for a new file (or files).
this means that there is a file for each page and with that file your browser is creating the image of the page. (usually there is one file for a page and it is some format of picture but I encountered base64 encoded picture and a picture cut into four pieces).
so we want to download and save all the files that are containing the book's pages.
now, usually there is a consistent pattern to the addresses of the files and there is some incrementing number in it (as we can see in the captured data the difference between following files), and knowing the number of pages in the book we can guess ourselves the remaining addresses till the end of the book (and of course download all the files programmatically in a for loop)
and we could stop here.
but sometimes the addresses are bit difficult to guess or we want the process to be more automatic.anyway we want to get programmatically the number of pages and all the addresses of the pages.
so we have to check how the browser knows that stuff. usually the browser downloads some files at the beginning and one of them contains the number of pages in the book (and potentially their address). we just have to check in the captured data and find that file to parse it in our proram.
at the end there is issue of security:
some websites try to protect their data one way or another (ussually using cookies or http authentication). but if your browser can access the data you just have to track how it does it and mimic it.
(if it is cookies the server will respond at some point with Set-Cookie: header. it could be that you have to log-in to view the book so you have to track also this process. usually it's via post messeges and cookies. if it is http authentication you will see something like Authorization: Basic in the request headers).
in your case the answer is simple:
(all the files names are relative to the main file directory: "http://en-gage.kaplan.co.uk/LMS/content/live_content_v2/acca/exam_kits/2014-15/p6_fa2014/")
there is a "manifest.zip" file that contains "pages.xml" file which contains the number of files and links to them. we can see that for each page there is a thumb, a small, and a large pictures so we want just the large ones.
you just need a program that will loop those addresses (from Paper/Pages/491287/Zoom.jpg to Paper/Pages/491968/Zoom.jpg).
finally you can merge all the jpg's to pdf.

Related

How to use HTTP Sniffer (Chrome Network tab) to auto download files based on file name

When I use the Chrome network tab I there a certain files I can see that have the file have the same names (different address though) each time I go to a different page.
In the network tab I can download the files manually but is there simple way to download the files based on the files name since they are the same per page. I'm sure there is a way to do it with code, but I don't know any code.
Ok There is a number of things you can look at in terms of software that is out there
If you looking for network packets and want to analyse what is moving through your TCP sockets use WireShark
If you are looking at copying an entire website to analyse it you can use one of the website grabbers , Ive used Httrack but there is a whole list of themWebsite grabbers

get a json file that a webpage is made of

I'm not familiar with web development but I believe this web page text content
https://almath123.github.io/semstyle_examples/
is made of two JSON files mentioned in it (semstyle_results.json and semstyle_results.json) and the JSON files are completely present in ram (If this is the correct term for referring to it) because when I disconnect the internet I can still browse the page and see the text content.
I want to download semstyle_results.json file. Is that possible? how can I do that?
Technically if you visit a website you're "downloading" the content. Your browser sends a request for information and a server responds by sending you the information. You're viewing that information locally. Dynamic sites poll or make further requests as you browse to keep the data updated and relevant, but it's sent to you.
If you want to easily download any of the content from the website, a simple way is to open up the development tools (CTRL + SHFT + I on windows for Firefox and Chrome), go to a source file and click save as. The network tab shows you requests that were made which includes not just files such as json but also the details of the request.
Here is a screenshot locating one of the json files in a Chrome-based browser (Brave)
The webpages may not always show that they will support json or xml return of data. For example if you inspect this webpage SEC EDGAR database using the method described above, it shows no json link but if you append index.json at the end of the link it will return the same data in json format or xml format, if you so please.
i.e: same website but with json endpoint
So it is always a good idea to see if the website hosts developer information. For example SEC EDGAR provides developer tools that mentions that the directory structure can be accessed via HTML, XML or JSON.
SEC developer information

Send Email with all new OneNote entries

I use Microsoft OneNote daily to take notes. I would like to write a script to send myself an email every night with all the new notes I took that day across notebooks so I can review them. This would usually be straightforward in e.g. a Word doc where I can timestamp all saves and take the latest file, diff it with the last file from the previous day and send the diff. Unfortunately OneNote complicates this for at least two reasons:
OneNote autosaves and as far as I can tell does not offer the ability to rename saves or add a timestamp to the filename
Notebooks and pages mean changes are across "documents" instead of a single file that can be diff'd.
So I am looking for a solution that considers the complications above. Thanks.
The basic approach via the microsoft-graph API
./me/onenote/pages?$filter=lastModifiedDateTime ge yyyy-MM-ddThh:mm:ssZ&$expand=parentNotebook
will yield json data with
title - Page title
links/oneNoteWebUrl - allows opening of the onenote page in web browser
links/oneNoteClientUrl - allows opening of the onenote page in onenote app
parentNotebook/displayName - Notebook name
self - needed to get page content.
for small page numbers this may work but is likely to time out with a 504 error for a drive with many pages.
In that case a two stage approach is required.
./me/onenote/sections?$filter=lastModifiedDateTime ge yyyy-MM-ddThh:mm:ssZ
will return a list of all the sections that have been modified since the defined lastModifiedDateTime.
Next iterate through the returned json data and get pages modified since lastModifiedDateTime with the returned pagesUrls using the format
.me/onenote/sections/1-xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx/pages?$filter=lastModifiedDateTime ge yyyy-MM-ddThh:mm:ssZ&$expand=parentNotebook
yielding the same data as noted previously.
Once you have this data you can generate an email containing a list of the modified Notebooks,page names and page links.
If you need the actual page data(content) then you need to call
./me/onenote/pages/1-1c13bcbae2fdd747a95b3e5386caddf1!1-xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx/content?includeIDs=true&includeInkML=true&preAuthenticated=true
Which will give you text/html, ink and links to other resources from each page.

HTML5 read files from path

Well, using HTML5 file handlining api we can read files with the collaboration of inpty type file. What about ready files with pat like
/images/myimage.png
etc??
Any kind of help is appreciated
Yes, if it is chrome! Play with the filesytem you will be able to do that.
The simple answer is; no. When your HTML/CSS/images/JavaScript is downloaded to the client's end you are breaking loose of the server.
Simplistic Flowchart
User requests URL in Browser (for example; www.mydomain.com/index.html)
Server reads and fetches the required file (www.mydomain.com/index.html)
index.html and it's linked resources will be downloaded to the user's browser
The user's Browser will render the HTML page
The user's Browser will only fetch the files that came with the request (images/someimages.png and stuff like scripts/jquery.js)
Explanation
The problem you are facing here is that when HTML is being rendered locally it has no link with the server anymore, thus requesting what /images/ contains file-wise is not logically comparable as it resides on the server.
Work-around
What you can do, but this will neglect the reason of the question, is to make a server-side script in JSP/PHP/ASP/etc. This script will then traverse through the directory you want. In PHP you can do this by using opendir() (http://php.net/opendir).
With a XHR/AJAX call you could request the PHP page to return the directory listing. Easiest way to do this is by using jQuery's $.post() function in combination with JSON.
Caution!
You need to keep in mind that if you use the work-around you will store a link to be visible for everyone to see what's in your online directory you request (for example http://www.mydomain.com/my_image_dirlist.php would then return a stringified list of everything (or less based on certain rules in the server-side script) inside http://www.mydomain.com/images/.
Notes
http://www.html5rocks.com/en/tutorials/file/filesystem/ (seems to work only in Chrome, but would still not be exactly what you want)
If you don't need all files from a folder, but only those files that have been downloaded to your browser's cache in the URL request; you could try to search online for accessing browser cache (downloaded files) of the currently loaded page. Or make something like a DOM-walker and CSS reader (regex?) to see where all file-relations are.

Printable Large PDF on the Web

The Problem
I have a 35mb PDF file with 130 pages that I need to put online so that people can print off different sections from it each week.
I host the PDF file on Amazon S3 now and have been told that the users don't like to have to wait on the whole file to download before they choose which pages they want to print.
I assume I am going to have to get creative and output the whole magazine to JPGs and get a neat viewer or find another service like ISSUU that doesn't suck.
The Requirements and Situation
I am given 130 single page PDF Files each week (All together this makes up The Magazine).
Users can browse the Magazine
Users can print a few pages.
Can Pay
Automated Process
Things I've tried
Google Docs Viewer - Get an Error, Sorry, we are unable to retrieve the document for viewing or you don't have permission to view the document.
ISSUU.com - They make my users log in to print. No way to automate the upload/conversion.
FlexPaper - Uses SWFTools (see next)
SWFTools - File is too complex error.
Hosting PDF File with an Image Preview of Cover - Users say having to download the whole file before viewing it is too slow. (I can't get new users. =()
Anyone have a solution to this? Or a fix for something I have tried already?
PDF documents can be optimized for downloading through the web, this process is known as PDF Linearization. If you have control over the PDF files you are going to use, you could try to optimize them as linearized PDF files. There are many tools that can help you on this task, just to name a few:
Ghostscript (GPL)
Amyuni PDF Converter (Commercial, Windows only, usual disclaimer applies)
Another option could be to split your file in sections and only deliver each section to its "owner". For the rest of the information, you can put bookmarks linking to the other sections, so that they can be retrieved also if needed. For example:
If the linearization was not enough and you do not have a way to know how to split the file, you could try to split it by page numbers and create bookmarks like these:
-Pages 1-100
-Pages 101-200
-Pages 201-300
...
-Pages 901-1000
-All pages*
The last bookmark is for the ambitious guy that wants to have the whole thing by all means.
And of course you can combine the two approaches and deliver each section as a linearized PDF.
Blankasaurus,
Based on what you've tried, it looks like you are willing to prep the document(s) or I wouldn't suggest this. See if it'll meet your needs... Download ColdFusion and install locally on your PC/VM. You can use CF's cfpdf function to automatically create "thumbnails" (you can set the size) of each of the pages without so much work. Then load it into your favorite gallery script with links to the individual PDFs. Convaluted, I know, but it shouldn't take more than 10 mins once you get the gallery script working.
I would recommend splitting the pdf into pages and then using a web based viewer to publish them online. FlexPaper has many open source tools such as pdf2json, pdftoimage to help out with the publishing. Have a look at our examples here:
http://flexpaper.devaldi.com/demo/