Scraping AJAX generated table to download PDFs in bulk - json

I'm trying to download (or alternatively open and save) approximately 30,000 PDF documents. The documents that are only accessible through a 3rd party service provider's website/platform (there are no ethical dilemmas here).
The website is secure and needs to be logged into (I have access) and the table is generated via AJAX. The report I intend on reading from has a URL of the form https://sub.website.com/au/report/index?id=1001# that doesn't change when dates or other filters change. In total there are 180,000+ table entries, not all have an associated invoice and not all invoices are required.
Using Chrome DevTools I can see the elements; table name is #reportResults, invoice details are in a html element.
There also looks to be an API but I don't know where to start here either.
How do I scrape data from this using VBA? I have downloaded the JSON.bas module recommended in other solutions for scraping JSON and AJAX. But for this situation I don't know how to use it and where to go from here.
I'm handy with VBA but have no experience with any other languages.

Related

RShiny Output as HTML or PDF

Originally, I built a shiny application that served as a dashboard for clients. However, my team later determined that, rather than having a live dashboard that clients can log into, we would simply send pdf reports. I have been trying to determine if there is a way to convert a shiny page into a static html file (or even better, directly as a pdf) in a fast and convenient programmatic way as there are hundreds of pages that need to be saved (one for each client).

Dynamic PDF Generation from Salesforce Data

I'm looking for some recommendations on a solution to build dynamic PDFs using salesforce object data. We currently have a layout designed in photoshop, that we're looking to import into Salesforce and fill in various snippets/images based on data that lies within an object. The final product should come out as a PDF
I started building this using Adobe XFDF. I exported the PSD as a PDF and created a fillable form from it. This was then populated from an XFDF file generated from Salesforce. This does work but the design issues with fillable forms, requirement for acrobat pro on every system that uses it and the lack of support for referencing file templates that are not local have killed this. One of these issues alone wouldn't be deal breakers, but all 3 combined are too much to overcome.
While this is mostly all sorted out on the Salesforce side, I'm not sure of the best way to proceed with this when it comes to PDF generation, here are a couple of ideas that might work, but I don't have enough experience to be sure:
Generate HTML/CSS File from PSD file, upload to salesforce, modify html file within salesforce, send to PDF generation API - adobe api looks promising for this, but can I send over html and css files together to generate a single PDF?
Use Salesforce PDF tools to generate PDF, will need to modify visualforce page to the same design as reference design in PSD.
Use some sort of third party PDF generator tool that will allow me to reference my current design as a template.
I'm open to any suggestions, Thanks!
In salesforce, PDF can be generated without using any app. Check out the official document by Salesforce.
https://developer.salesforce.com/docs/atlas.en-us.pages.meta/pages/pages_output_pdf_renderas.htm
A quick start guide How to generate PDF in Salesforce
It does not require any purchase or separate license. If you are looking specifically an App, it can be found on app exchange.
https://appexchange.salesforce.com/appxSearchKeywordResults?keywords=pdf%20generator

Python Web Crawler with stored Web History

I'm Creating a Python Web crawler, with the ability to browse web history & parse through the information and store important information within a Database for Forensics/Academic Purposes. I understand the functionality to browse web sites but the part I'm struggling with is to be able too crawl through web history I will give a scenario:
During Forensic Investigation.
You have been given a full Forensic Image of Suspects Computer, you then locate the AppData folder for Google Chrome which stores all information about suspect including form information, credentials & web history.
How would I set up the web crawler to only search through data in the suspects web history.
I am also having issues accessing the information stored within Google Chrome User Data to try view my personal information which is stored here as a start, I am currently attempting to use DB Browser to view the files to try see my own web history however I'm not having much luck with this. Any Suggestions
For those interested in this project of mine I can update this thread as I go so you can see the progress of my web-crawler the end result will have the ability to take web-history and data from public & private websites sort important information i.e. name, address, D.O.B into a database for to be used later as a biographic dictionary.
I WILL STRESS THIS AGAIN AS THIS IS ALL FOR ACADEMIC PURPOSES IN CONTROLLED ENVIROMENT AND USED ON A TEST/FAKE ACCOUNT
Hindsight (https://github.com/obsidianforensics/hindsight) is an open source tool written in Python that can parse a ton of information from the files in /Google/Chrome/User Data/ directory.
You could look at it's source for inspiration, or just run the tool and parse its output (it can produce XLSX, JSON, or SQLite) in your crawler.

Offline Form Submission - Data SYnc

We have a Loan Management System, and as everybody knows there is Field Investigation like Residence, Office, Business Verification.
So we have a requirement to actually support offline data entry also.
Meaning, the Field Investigation officer may download the "template" in his mobile or and the save data. Later when he is connected to App, he can sync that data.
As of now in our web application, we have JSP pages to render above specific forms.
1.) How to pragmatically download the template or html content.
2.) Save the form data in local DB or say browser db
3.) Then later sync that Json data with relational DB.
Best is to download the jsp content from ajax request, then process it's html content and through HttpClient get the response for each and every url's (javascript, css) included in the package.
Zip it and then make it downloadable through browser.

How to save a local file programmatically given an amount of data and a file name with html5

I want to implement this use case found on the W3C File API document:
User agents should provide the ability to save a local file programmatically given an amount of data and a file name.
Example: A Spreadsheet App. User interacts with a form, and generates some input. The form then generates a CSV (Comma Separated Variables) output for the user to import into a spreadsheet, and uses "Save...". The generated output can also be directly integrated into a web-based spreadsheet, and uploaded asynchronously.
Source: http://www.w3.org/TR/file-upload/#requirements
From my understanding, it should be possible to create a completely offline Spreadsheet app with this, but I could not find a single example both in the W3C document or on the web that has this kind of use case implemented. At least not a completely offline one. Of course it doesn't need to be a spreadsheet application; a simple text editor or TODO manager would suffice. Am I missing something?
Also, would this make it possible to create one of the previously mentioned applications (text editor, todo manager or even spreadsheet app) from a single html5 file (with embedded JS and css)?
I seems I didn't search hard enough, here's an example of what I want http://html5-demos.appspot.com/static/a.download.html