How to extract text from a Google drawing? - google-drive-api

I am given a Google drawing containing an application component architecture. The drawings contain text used to manually populate a fairly long parameter file. The parameter file is then used to create an AWS database instance. I'm hoping to automate this tedious and error prone process by extracting the desired values from the drawing and populating the parameter file.
I'm using Python and just getting started with this effort. I've been able to download the file using the mime type "image/svg+xml" but it appears the text is rendered as a vector drawing. I've also downloaded the file as a PDF, but I still can't seem to get the text.
I'm not a master of Google drawings. From what I've read the drawings are very simple and don't support anything like a tag that one might use to find important data. I suspect I'm barking up the wrong tree.
Is it possible to extract text from a Google drawing? If so, what would be the general process flow and what mime type would I use?

Related

Autodesk design automation Revit , text file as input

The Revit API I developed, take a text file as input.
the text file looks like as below......
1.002, 20,502, 21.706
12.502, 5,502, 7.706
21.002, 15,502, 14.706
.....................
.....................
(The values are not correct.just imaginary. I am just showing how my text file looks like)
I am basically reading the text data as input.
Now if I want to convert the same API as Design automation API, I guess I will not be able to use "text file" as input.
My question is, what should be file type of input file, if it is consisted of 3d point coordinates as described above.
Should it be Json? If it need to be json, then how I should write it for point coordinates? or any other suggestion for file type will be a big help.
If there is any example code, will be a big help.
In the list for supported input file format, txt file is not included.
If I write a Json file, then please give me some clue, how should I arrange it and read the file for Revit.
Many thanks in advance.
T
Thank you for your query.
The slightly more complex question is how to generate multiple output files.
That is answered by the article
on How to generate dynamic number of output with Design Automation for Revit V3.
In passing, it also mentions multiple input files, saying:
"... For the zipped input file, it's well documented at https://forge.autodesk.com/en/docs/design-automation/v3/tutorials/revit/step6-post-workitem/, but for the output zipped result, it's not so clear..."
Trying to follow that link, I note that it is out of date.
The updated link is:
https://forge.autodesk.com/en/docs/design-automation/v3/tutorials/revit/step7-post-workitem/
Looking at the additional notes on input arguments, I see the instructions on how to pass JSON input data directly in the workitem itself.
I would assume that you can also use a different prefix instead of data:application/json such as data:application/text to pass in the data in its current form.
Please try that out and let us know how it works for you.
Alternatively, you can just stay on the safe side and convert your text data to JSON format.
There are innumerable ways of doing so.
The most minimalistic and simple would look like this:
[1.002, 20,502, 21.706,
12.502, 5,502, 7.706,
21.002, 15,502, 14.706,
...]
That represents on single array of doubles.
A slightly more structured approach might be to pass in an array of triples of doubles like this:
[[1.002, 20,502, 21.706],
[12.502, 5,502, 7.706],
[21.002, 15,502, 14.706],
...]
As you see, it is not hard.
I hope this helps.

Angular 5 : How to integrate html data (which is a formatted text) in a .docx file?

I'm still a bit newbie in the code game, and i would like some advices from senpai.
Context :
I'm making a angular 5 app which has a form, which is using also QuillJS, a rich text editor for only one question (the previous questions are simple input field for strings or numbers). My goal is to allow my users to download the form and the text from QuillJS they completed, on a .docx file (Word). And of course i'm doing this because i want to keep the formatted text from QuillJs, otherwise i would have just get a good ol' string.
Issue :
The point is, i'm already building a docx file for the first questions of the form and the only method i found for now to put my html string from QuillJs in a Word readable data type, is to use html-docx-js library.
This post even explain how. But, BUT, i don't want to use saveAs function (see the post), that create a file and put the content in it. I want to put the content in the docx file i'm already creating.
So here is my question, how would you, senpai, do it ?
The thing is that i've got a Blob file (cf post), but i don't know how to put it in my docx file. I tried to see if FileReader function could do the job, but well... i don't get how to integrate this special Blob file type (which is : application/vnd.openxmlformats-officedocument.wordprocessingml.document) in the docx file.
Maybe there is another way, i'm open to any suggestions, i don't mind at all to change my way of doing.
Thank you. Save internet, give me a tip.
The official documentation for html-docx-js does not state any other options than the asBlob method. I suggest two options:
Decoding the DOCX:
The Blob filetype is not special. The blob is just binary representation of the docx. I found in SE question that the docs in fact zipped XML document. You could unzip it using JSZip or other JS solution, then read it using FileReader and try to deal with it in a DOM manner. I'm not qualified to go into details how that could work.
Adding HTML to the user input first and then outputting it as a whole
This is changing the way you want to do it. In this way, I would first create formatted HTML with the data you collected in other parts of the questionnaire. Then you append the rich data from the rich editor. At last you take this HTML data and save it into single file using the asBlob function.
The second solution will maybe strip some customization from your original approach, but it seems much faster to implement.

Alfresco simple OCR. Extract text from PDF file and use it to start workflow

I'm using alfresco-simple-ocr with pdfsandwich and tesseract OCR. I want to get the text from a document inserted to a folder and then use the text and a pdf file in a new workflow. I've managed to do OCR extraction and how to start a workflow with a file inserted to catalogue,
but I can't get text from file and use it in the workflow. Is there a possibility to do this? Where can I start implementing that function ? Greetings, RafaƂ
You don't need any extension for that. Alfresco already integrates PDfBox that will do that for you. After, it depends of your PDF if it's a PDF containing images (so scanned documents) or if it's a PDF containing already text inside.
If you want to OCR some images, you have as well this module:
https://github.com/bchevallereau/alfresco-tesseract
When you know what you want to transform, you can look at this page where you have a javascript sample on how to call transformers:
http://docs.alfresco.com/5.2/references/dev-extension-points-content-transformer.html
You can do that as well in Java if you need.

Can Google Apps Script blobs be returned with their content type set to Spreadsheet?

As part of a suite of tools I am developing for the company I work for, I have an add-on in development that when first installed generates all the relevant files and folders for the suite.
Due to the complexity of some of the files I discovered that using the following code was the quickest way to generate the files:
function createTemplate(branchId){
var home=DriveApp.getFolderById(branchId)
var master=DriveApp.getFileById(stringId).getBlob();
home.createFile(master);
}
I am presented with the problem, however, that all the files generate as PDFs. I am aware that this is because the default blob content type is the PDF file type and that getAs(contentType) can be used to specify the desired blob content type; but I have struggled to find any documentation specifying how to call the content type as a Spreadsheet, for example.
Is this possible and I've just missed somewhere in Google's documentation how to specify content type as spreadsheet or is it no possible and can blobs only be returned with content types of PDF or image types?
UPDATE: So I discovered accidentally that the content type for spreadsheets would appear to have be specified by application/vnd.google-apps.spreadsheet However, I now get the error Converting from application/pdf to application/vnd.google-apps.spreadsheet is not supported.
FURTHER UPDATE: Potential solution using different route found using the below script:
function createTemplate(branchId){
var home=DriveApp.getFolderById(branchId);
var master=DriveApp.getFileById("fileId").makeCopy("PP Template", home);
}
Although this had around an 8 second run-time so not the quickest thing in the world. Blob related answer still welcome.
As far as I am aware, blob will not do what you want to at this time. There is a reason that DocumentApp.create, etc. exist. They are the function calls that Google wants you to make for what you want to do. They don't want 13 ways to perform the same function.

CK Editor JSON file creation for inserting images

I am using CK Editor on my web site and have built a simple uploader script in php to add images to my server. I am using the image browser plugin to enable the inclusion of images into my text. The image browser seems to work using a JSON file which must list all images in the image folder.
I am completely new to JSON and to get around learning how to build a JSON file I created a PHP file which simply reads the names of my images from a database and includes them in the JSON file.
This, as a workaround seems to be perfectly functional, however it is reliant on my upload script having to add the image names to the database (an unnecessary step) and it is also bad practice.
I am looking for a good tutorial or explanation on how to make a very simple JSON file which lists the images in my uploads folder in the correct format for CK Editor so that I can free myself from my image names database and program like a big boy.
Any and all help would be apreciated.