Is there a way for textract to render pdf images to excel data tables? - ocr

I have a pdf of a data table, but it is only an image. meaning, I can't copy and paste value from it and OCR isn't available. Is there a way to use textract (or some other service) to get the data table to excel?

You can use textract for parts of this, but there's no way to structure the results well. For example, you can make one long list the text blocks containing "line" but you would need a way to structure them similarly to how they were structured in the original data table from the PDF. That last part in particular makes this problem quite challenging.

Related

JSON editor to clean up dataset

I have a JSON data set with around 10,000 objects that each hold around 20 data items.
Since some data items are empty, I'd like to filter and clean up the data set, i.e. delete all data items that are empty.
Is there a good JSON editor that someone can recommend? Or if there is another way to clean up the file? I am on a Mac and rather not use an online editor.

How do I retain my subcolumns when converting from Excel to CSV to JSON for Firebase Database

Here is my raw Excel data:
Here is my PivotTable to give you an idea of how I would like my JSON structure to be:
But when I convert from .XLSX to .CSV to .JSON and then load this file into Firebase, here is what my data looks like (from what I can see, a completely different structure to my PivotTable):
It looks like it has structured the data according to its row number. Any ideas please?
The Firebase Realtime Database stores JSON data. It doesn't store tables, nor rows and columns, nor spreadsheets.
So during your conversion process the data gets converted from the table you have in Excel to the most corresponding JSON structure, which means that each row in your table become a top-level node in the JSON, and then each cell in that row became a property with the column heading as the property name, and the value from the cell as the value of that property.
If there was any code involved in this conversion, you will have to modify the code to generate the structure you want. If you've tried to do that but got stuck, edit your question to include the minimal, complete/standalone code with which we can easily reproduce the problem.

How do i refresh csv data set in quicksight and not replace the data set as this loses my calcs

I am looking to refresh a data set in quicksight, this is in Spice. The data set comes from a csv file that has been updated and now has more data than the original file I uploaded.
I can't seem to find a way to simply repoint to the same file with same format. I know how to replace the file but whenever i do this it states that it can't create some of my calculated fields and so drops multiple rows of data!
I assume I'm missing something obvious but I can't seem to find the right method or any help on the issue.
Thanks
Unfortunately, QuickSight doesn't support refreshing file data-sets to my knowledge. One solution, however, is to put your CSV in S3 and refresh from there.
The one gotcha with this approach is that you'll need to create a manifest file pointing to your CSV. This isn't too difficult and the QuickSight documentation is pretty helpful.
You can replace the datasource by going into the Analysis and clicking on the pencil icon as highlighted in Step 1. By replacing dataset, you will not lose any new calculated fields that might have been calculated already on the old dataset.
If you try to replace the data source by going into the Datasets as highlighted below, you'll lose all calculated fields and modifications etc
I don't know when this was introduced but you can now do this exact thing through the "Edit Dataset", starting either from the Dataset page or from the 'pencil' -> Edit dataset inside an Analysis. It is called "update file" and will result in an updated dataset (additional or different data) without losing anything from your analysis including calculated fields, filters etc.
The normal caveat applies in that the newer uploaded file MUST contain the same column names and datatypes as the original - although it can also contain additional columns if needed.
Screenshots:

Formatted Text in report Access 2013

I am using Access 2013.
I am generating a specifications writing database which involves the user inputting a number of items as data which is then incorporated into a report. The report structure is largely similar but there are something like 30 variants with small changes based on the data entered at the start.
Each report extends to around 7 pages whereas the data is only in the first 2 pages. The remaining pages contain standard clauses common to all the reports. effectively they are an instructional on using the products to which the specification report refers.
I now have a rather frustrating problem.
Is it possible please to set up a standard report using a rich text format. I have thought of using the bulk of the report as an image and incorporating this but it uses up lots of storage space. I have looked through the forum but cant find a scenario that really fits what I am looking for..
I almost want a really big text box that I can format in the same way as you would a word document. Not sure if this makes sense so if further clarification is required please ask. Many thanks in anticipation.
You can use Rich Text in Access, with the data stored in a Memo (Long Text) field: Link
I would set it up like this:
Have a table with one field per clause. This table has only one record. Fill out the texts directly in the table (or build a simple form for it, if you like).
In your record source of the report, add this table without any join (= cross join).
Then add all the fields to the report footer, height = 1 line, Can grow = True.
This will keep each clause together on a page. If you don't care about this, you can also use one huge field with all the text.

process csv File with multiple tables in SSIS

i trying to figure out if its possible to pre-process a CSV file in SSIS before importing the Data into SQL.
I currently receive a file that contains 8 tables with different structures in one flat file.
the Tables are identified by a row with the Table name in it encapsulated by Square Brackets i.e. [DOL_PROD]
the the data is underneath in standard CSV format. Headers first and then the data.
the tables are split by a blank line and the process repeats for the next 7 tables.
[DOL_CONSUME]
TP Ref,Item Code,Description,Qty,Serial,Consume_Ref
12345,abc,xxxxxxxxx,4,123456789,abc
[DOL_ENGPD]
TP Ref,EquipLoc,BackClyLoc,EngineerCom,Changed,NewName
is it posible to split it out into seperate CSV files? or Process it in a loop?
i would really like to be able to perform this all with SSIS automatically.
Kind Regards,
Adam
You can't do that by flat file source and connection manager alone.
There are two ways to achieve your goal:
You can use Script Component as source of the rows and to process the files, and then you'd do whatever you want with a file programatically.
The other way, is to read your flat file treating every row as a single column (i.e. without specifying delimiter), and then, via Data Flow Transformations, you'd be splitting rows, recognizing table names, splitting flows and so on.
I'd strongly advise you to use Script Component, even if you'd have to learn .NET first, because the second option will be a nightmare :). I'd use Flat File Source to extract lines from file as single column, and thet work it in Script Component, rather than reading a "raw" file directly.
Here's a resource that should get you started: http://furrukhbaig.wordpress.com/2012/02/28/processing-large-poorly-formatted-text-file-with-ssis-9/