I use Microsoft OneNote daily to take notes. I would like to write a script to send myself an email every night with all the new notes I took that day across notebooks so I can review them. This would usually be straightforward in e.g. a Word doc where I can timestamp all saves and take the latest file, diff it with the last file from the previous day and send the diff. Unfortunately OneNote complicates this for at least two reasons:
OneNote autosaves and as far as I can tell does not offer the ability to rename saves or add a timestamp to the filename
Notebooks and pages mean changes are across "documents" instead of a single file that can be diff'd.
So I am looking for a solution that considers the complications above. Thanks.
The basic approach via the microsoft-graph API
./me/onenote/pages?$filter=lastModifiedDateTime ge yyyy-MM-ddThh:mm:ssZ&$expand=parentNotebook
will yield json data with
title - Page title
links/oneNoteWebUrl - allows opening of the onenote page in web browser
links/oneNoteClientUrl - allows opening of the onenote page in onenote app
parentNotebook/displayName - Notebook name
self - needed to get page content.
for small page numbers this may work but is likely to time out with a 504 error for a drive with many pages.
In that case a two stage approach is required.
./me/onenote/sections?$filter=lastModifiedDateTime ge yyyy-MM-ddThh:mm:ssZ
will return a list of all the sections that have been modified since the defined lastModifiedDateTime.
Next iterate through the returned json data and get pages modified since lastModifiedDateTime with the returned pagesUrls using the format
.me/onenote/sections/1-xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx/pages?$filter=lastModifiedDateTime ge yyyy-MM-ddThh:mm:ssZ&$expand=parentNotebook
yielding the same data as noted previously.
Once you have this data you can generate an email containing a list of the modified Notebooks,page names and page links.
If you need the actual page data(content) then you need to call
./me/onenote/pages/1-1c13bcbae2fdd747a95b3e5386caddf1!1-xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxx/content?includeIDs=true&includeInkML=true&preAuthenticated=true
Which will give you text/html, ink and links to other resources from each page.
Related
I am trying to access old version of Wiki pages using data instead of "oldid". Usually to access and a version of a wiki page, I have to use the page id like this https://en.wikipedia.org/w/index.php?title=Main_Page&oldid=969106986, is there a way to access the same page using the date without knowing the ID? If i know for example that there is a version of the page published on "12:44, 23 July 2020 "
In addition to the "main" API (called the action API by MediaWiki developers), you can also use the REST API. It may or may not be enabled at all wikis, but if you intend to query Wikipedia content.
The revision module of the \action API (linked to in #amirouche's answer) allows you to get the wikitext format of a page. That is the source format that is used by MediaWiki, and it isn't easy to get a HTML from it, which can be easier to analyze (especially if you do ĺingquistic analytics, for instance).
If HTML would be better for your use case, you can use the REST API, see https://en.wikipedia.org/api/rest_v1/#/. For instance, if you're interested in English Wikipedia's Main Page as of July 2008, you can use https://en.wikipedia.org/api/rest_v1/page/html/Main_Page/223883415.
The number (223883415) is the revision ID, which you can get through the action API.
However, keep in mind that re-parses the revision's wikitext into HTML. That means it doesn't need to be exactly what showed as of the date the revision was saved. For instance, the wikitext can contain conditions on current date (that is used for automatically upating the mainpage). If you're intereted in seeing that, you would need to use archive.org.
You can use the MediaWiki API to get revision; refer to the documentation at: https://www.mediawiki.org/wiki/API:Revisions.
You need to map revision ids with dates. It will be straightforward :).
When I view a website in my browser (for example https://www.homedepot.ca/en/home/p.725-inch-miter-saw-with-laser.1000748698.html), it contains information that is not in the source code.
For example, the source code of this page doesn't specify a product price:
<span itemprop="price">-</span>
<small>/
each</small>
However, when viewed in a browser, the tag does actually contain a price.
How can I retrieve the product's price from the source code?
Short answer: just by reading the source, you can't. The price is dynamically loaded from their servers (using javascript), after the page loaded.
Using appropriate tools (such as the network tab in Chrome/Firefox's developer console) you can figure out where they retrieve the price from (in this case JSON document on their servers). However, even if you used that, there is no guarantee that it'll still work tomorrow - they can charge their link or the format of the data at any moment.
A good place to get started on the technologies they use is reading up on
JavaScript
AJAX
JSON
If you are interested in retrieving information from their page pro grammatically, a good start would be to contact them to see if they have a public interface (API) you can use. These are usually more stable to use.
Hi guys I am trying to download a document from a swf link in ipaper
Please guide me on how can I download the book
Here is the link to the book which I want to convert to pdf or word and save
http://en-gage.kaplan.co.uk/LMS/content/live_content_v2/acca/exam_kits/2014-15/p6_fa2014/iPaper.swf
Your kind guidance in this regard would be appreciated.
Regards,
Muneeb
first you open the book in your browser with network capturing (in developer/s tools).
you should open many pages at diffrent locations with and without zoom
then look in the captured data.
you will see that for each new page you are opening, the browser asks for a new file (or files).
this means that there is a file for each page and with that file your browser is creating the image of the page. (usually there is one file for a page and it is some format of picture but I encountered base64 encoded picture and a picture cut into four pieces).
so we want to download and save all the files that are containing the book's pages.
now, usually there is a consistent pattern to the addresses of the files and there is some incrementing number in it (as we can see in the captured data the difference between following files), and knowing the number of pages in the book we can guess ourselves the remaining addresses till the end of the book (and of course download all the files programmatically in a for loop)
and we could stop here.
but sometimes the addresses are bit difficult to guess or we want the process to be more automatic.anyway we want to get programmatically the number of pages and all the addresses of the pages.
so we have to check how the browser knows that stuff. usually the browser downloads some files at the beginning and one of them contains the number of pages in the book (and potentially their address). we just have to check in the captured data and find that file to parse it in our proram.
at the end there is issue of security:
some websites try to protect their data one way or another (ussually using cookies or http authentication). but if your browser can access the data you just have to track how it does it and mimic it.
(if it is cookies the server will respond at some point with Set-Cookie: header. it could be that you have to log-in to view the book so you have to track also this process. usually it's via post messeges and cookies. if it is http authentication you will see something like Authorization: Basic in the request headers).
in your case the answer is simple:
(all the files names are relative to the main file directory: "http://en-gage.kaplan.co.uk/LMS/content/live_content_v2/acca/exam_kits/2014-15/p6_fa2014/")
there is a "manifest.zip" file that contains "pages.xml" file which contains the number of files and links to them. we can see that for each page there is a thumb, a small, and a large pictures so we want just the large ones.
you just need a program that will loop those addresses (from Paper/Pages/491287/Zoom.jpg to Paper/Pages/491968/Zoom.jpg).
finally you can merge all the jpg's to pdf.
There are many libraries out there which purport to transform HTML to PDF. All that I've looked at have there limitations. We don't want to spend any money on this so wanted to know if it is possible to print to file in PDF format without all the pop ups that Outlook would normally produce. We are using Outlook 2013 with Exchange.
This thread suggests that the answer it NO. But this thread suggests that it might be done. I'm looking for a clear path to achieve my goal.
To complicate things, I am using the Mail.Display function to allow the user to modify the email before sending. They can add attachments if they want also. Once they select the Send option, I want to capture the email that was sent and produce a PDF which will be stored in a data store for easy retrieval by anyone who accesses the customer account. Here is where I run into difficulty. The Mail object is not available after returning from the Display function. How can I get the sent email and process it?
Yes, it is possible.
Outlook uses Word as an email editor. So, you can use the Word object model to get the job done. The WordEditor property of the Inspector class returns an instance of the Document class from the Word object model which represents the message body. See Chapter 17: Working with Item Bodies for more information.
The ExportAsFixedFormat method of the Document class saves the document in PDF or XPS format.
The Problem
I have a 35mb PDF file with 130 pages that I need to put online so that people can print off different sections from it each week.
I host the PDF file on Amazon S3 now and have been told that the users don't like to have to wait on the whole file to download before they choose which pages they want to print.
I assume I am going to have to get creative and output the whole magazine to JPGs and get a neat viewer or find another service like ISSUU that doesn't suck.
The Requirements and Situation
I am given 130 single page PDF Files each week (All together this makes up The Magazine).
Users can browse the Magazine
Users can print a few pages.
Can Pay
Automated Process
Things I've tried
Google Docs Viewer - Get an Error, Sorry, we are unable to retrieve the document for viewing or you don't have permission to view the document.
ISSUU.com - They make my users log in to print. No way to automate the upload/conversion.
FlexPaper - Uses SWFTools (see next)
SWFTools - File is too complex error.
Hosting PDF File with an Image Preview of Cover - Users say having to download the whole file before viewing it is too slow. (I can't get new users. =()
Anyone have a solution to this? Or a fix for something I have tried already?
PDF documents can be optimized for downloading through the web, this process is known as PDF Linearization. If you have control over the PDF files you are going to use, you could try to optimize them as linearized PDF files. There are many tools that can help you on this task, just to name a few:
Ghostscript (GPL)
Amyuni PDF Converter (Commercial, Windows only, usual disclaimer applies)
Another option could be to split your file in sections and only deliver each section to its "owner". For the rest of the information, you can put bookmarks linking to the other sections, so that they can be retrieved also if needed. For example:
If the linearization was not enough and you do not have a way to know how to split the file, you could try to split it by page numbers and create bookmarks like these:
-Pages 1-100
-Pages 101-200
-Pages 201-300
...
-Pages 901-1000
-All pages*
The last bookmark is for the ambitious guy that wants to have the whole thing by all means.
And of course you can combine the two approaches and deliver each section as a linearized PDF.
Blankasaurus,
Based on what you've tried, it looks like you are willing to prep the document(s) or I wouldn't suggest this. See if it'll meet your needs... Download ColdFusion and install locally on your PC/VM. You can use CF's cfpdf function to automatically create "thumbnails" (you can set the size) of each of the pages without so much work. Then load it into your favorite gallery script with links to the individual PDFs. Convaluted, I know, but it shouldn't take more than 10 mins once you get the gallery script working.
I would recommend splitting the pdf into pages and then using a web based viewer to publish them online. FlexPaper has many open source tools such as pdf2json, pdftoimage to help out with the publishing. Have a look at our examples here:
http://flexpaper.devaldi.com/demo/