MS OneNote drawings - onenote

I work with *.one files using [MS-ONE]: OneNote File Format, [MS-ONESTORE]: OneNote Revision Store File Format and I am stuck in parsing drawing elements.
I found several unknown JCID codes (0x00060014, 0x0002003b, 0x00020047, 0x00120048) and PropertyIds (0x00001daa, 0x00001d4f, 0x00003415, 0x00001d4e, 0x00003416, 0x00003409, 0x0000340b, 0x0000340a, etc) which are not described in the official specification. I was trying to find some accordance between these codes, their content and Ink Serialization Format, but almost any good results.
Maybe, someone could give some tips to direct me in a correct way of parsing OneNote drawings.

Related

How to extract text from a Google drawing?

I am given a Google drawing containing an application component architecture. The drawings contain text used to manually populate a fairly long parameter file. The parameter file is then used to create an AWS database instance. I'm hoping to automate this tedious and error prone process by extracting the desired values from the drawing and populating the parameter file.
I'm using Python and just getting started with this effort. I've been able to download the file using the mime type "image/svg+xml" but it appears the text is rendered as a vector drawing. I've also downloaded the file as a PDF, but I still can't seem to get the text.
I'm not a master of Google drawings. From what I've read the drawings are very simple and don't support anything like a tag that one might use to find important data. I suspect I'm barking up the wrong tree.
Is it possible to extract text from a Google drawing? If so, what would be the general process flow and what mime type would I use?

How do I download the HTML5 version of a file that is stored in Box?

I have an application that reads files from Box (the cloud storage thing) and sends them to a content analyzer. Unfortunately, the content analyzer can't handle PowerPoint files, so I need some way to convert them to another format, such as HTML. I know Box can do this; but I can't figure out a way to extract the HTML5 version of the file so I can send it to the analyzer.
Is there a REST interface (or even better, a Node.js SDK call) that will let me extract the HTML5 version of a file from Box?
What I needed was the representations API. It's explained in the SDK docs at https://github.com/box/box-node-sdk/blob/master/docs/files.md#get-representation-info.

Forge Viewer crash with non-ASCII chars in materials

I recently came across a problem with the Autodesk Forge viewer (or should I call it the A360 viewer? Still not clear to me).
I used the model derivative API to translate a RVT file to SVF, being the suitable format for visualization, then retrieved all files locally (a lot like extract.autodesk.io actually) so I can feed them to the viewer.
For one of my RVT files, I had a problem when loading a 3D view :
SyntaxError: JSON.parse: bad control character in string literal at line 1296 column 33 of the JSON data
Doing my investigations, I found out the problem comes from ProteinMaterials.json.gz, which for this translation contains a non-ASCII chars (i.e. Materials names and descriptions) including one at line 1296. Removing the character causes the error to move to the next non-ASCII, and so on.
Is there a workaround for this problem, other than asking users to remove non-ASCII chars from their RVT files?
Call it the Forge Viewer :-)
Question: does the problem appear when you feed the unmodified files directly into the viewer?
I hope not. Otherwise, many others would be raising a similar complaint.
Conclusion: you need to escape the non-ASCII characters in the JSON yourself.
When you feed the files directly into the viewer, some step in the workflow does it for you.
When you store them locally, you need to explicitly perform this step yourself.
Does that make sense?
Can you confirm?
Thank you!

trying to load data from url in R

so I want to load all the formatted data from this url: https://data.mo.gov/Government-Administration/2011-State-Expenditures/nyk8-k9ti
into r so I can filter some of it out. I know how to filter it properly once I get it, but I can't get it "injected" into R properly.
I've seen many ways to pull the data if the url ends in ".txt" or ".csv", but if this url doesn't end in a filetype, the only way I know how to get it is to pull the html, but then I get... all the html.
there are several options to download the file as a .csv and inject it that way, but if I ever get good enough to do real work, I feel like I should know how to get this directly from the source.
The closest I've gotten is using the function:
XML content does not seem to be XML: 'https://data.mo.gov/Government-Administration/2011-State-Expenditures/nyk8-k9ti'
but i get an error that says
XML content does not seem to be XML: 'https://data.mo.gov/Government-Administration/2011-State-Expenditures/nyk8-k9ti'
so that doesn't work either :(.
If anyone could help me out or at least point me in the right direction, I'd appreciate it greatly.
It is quite complicated to scrape the data from the table but this website provides a convenient .json link file which you can access quite easily from R. The link https://data.mo.gov/resource/nyk8-k9ti.json can be found from Export -> SODA API.
library(rjson)
data <- fromJSON('https://data.mo.gov/resource/nyk8-k9ti.json')
I believe your question could be more precisely defined as "How to scrape data from website" rather than just simply loading data from an URL in R. Web scraping is totally another technique here. If you know some Python, I recommend you to take this free course teaching you how to get access to data on website via Python. Or, you can trythis website to get what you want, though, some advanced tools are not free. Hope it helps.

Most open and standards-compliant format for a JSON news feed

I can find a huge amount of converters for Atom or RSS to JSON. I can see App.Net and a Google Feed API for pushing feeds in JSON which seem to have traction. What I am struggling to figure out is what's the 'open standard' way of serving up a feed without tying people to transform XML or to use my own (or someone else's) proprietary JSON format.
At this point, I don't think there is anything like this. By definition JSON is schema-less, which means that you'll have a hard time finding the RSS or Atom of JSON.