Offline Form Submission - Data SYnc - json

We have a Loan Management System, and as everybody knows there is Field Investigation like Residence, Office, Business Verification.
So we have a requirement to actually support offline data entry also.
Meaning, the Field Investigation officer may download the "template" in his mobile or and the save data. Later when he is connected to App, he can sync that data.
As of now in our web application, we have JSP pages to render above specific forms.
1.) How to pragmatically download the template or html content.
2.) Save the form data in local DB or say browser db
3.) Then later sync that Json data with relational DB.

Best is to download the jsp content from ajax request, then process it's html content and through HttpClient get the response for each and every url's (javascript, css) included in the package.
Zip it and then make it downloadable through browser.

Related

Scraping AJAX generated table to download PDFs in bulk

I'm trying to download (or alternatively open and save) approximately 30,000 PDF documents. The documents that are only accessible through a 3rd party service provider's website/platform (there are no ethical dilemmas here).
The website is secure and needs to be logged into (I have access) and the table is generated via AJAX. The report I intend on reading from has a URL of the form https://sub.website.com/au/report/index?id=1001# that doesn't change when dates or other filters change. In total there are 180,000+ table entries, not all have an associated invoice and not all invoices are required.
Using Chrome DevTools I can see the elements; table name is #reportResults, invoice details are in a html element.
There also looks to be an API but I don't know where to start here either.
How do I scrape data from this using VBA? I have downloaded the JSON.bas module recommended in other solutions for scraping JSON and AJAX. But for this situation I don't know how to use it and where to go from here.
I'm handy with VBA but have no experience with any other languages.

Python Web Crawler with stored Web History

I'm Creating a Python Web crawler, with the ability to browse web history & parse through the information and store important information within a Database for Forensics/Academic Purposes. I understand the functionality to browse web sites but the part I'm struggling with is to be able too crawl through web history I will give a scenario:
During Forensic Investigation.
You have been given a full Forensic Image of Suspects Computer, you then locate the AppData folder for Google Chrome which stores all information about suspect including form information, credentials & web history.
How would I set up the web crawler to only search through data in the suspects web history.
I am also having issues accessing the information stored within Google Chrome User Data to try view my personal information which is stored here as a start, I am currently attempting to use DB Browser to view the files to try see my own web history however I'm not having much luck with this. Any Suggestions
For those interested in this project of mine I can update this thread as I go so you can see the progress of my web-crawler the end result will have the ability to take web-history and data from public & private websites sort important information i.e. name, address, D.O.B into a database for to be used later as a biographic dictionary.
I WILL STRESS THIS AGAIN AS THIS IS ALL FOR ACADEMIC PURPOSES IN CONTROLLED ENVIROMENT AND USED ON A TEST/FAKE ACCOUNT
Hindsight (https://github.com/obsidianforensics/hindsight) is an open source tool written in Python that can parse a ton of information from the files in /Google/Chrome/User Data/ directory.
You could look at it's source for inspiration, or just run the tool and parse its output (it can produce XLSX, JSON, or SQLite) in your crawler.

Where to store application data for webapp?

I have some data for a webapp that I would like to store on the server. What would be a good location to put those files?
I have a couple of static HTML pages that contain instance specific information. They need to survive a re-deploy of the webapp. They need to be editable by the server's administrator. They are included in other HTML pages using the html object tag.
I want to store preferences on the server, but cannot use a database. I am using JSP to write and read the preferences. There is no sensitive data in the preferences. Currently I am using the log directory. But obviously that is not a great choice.
I am using Tomcat. I thought of creating an appdata/myapp directory under the webapp directory. Is that good or bad?
If the server's administrator can also deploy the app, I would add the data file itself into the source control for the app, and deploy it all together. This way you get revision control of the data, and you get the ability to revert to known good data if the server fails.
If the administrator can't deploy the app, but can only edit the file, then you need plans to back up that file in the case that the server or server filesystem dies.
A third solution would be a hybrid: put the app in one source code repository. Put the data in a second source code repository. The administrator can edit the data and deploy the data. The developer can edit the app source code, and deploy the source code. This way, both are revision controlled, but you've separated responsibility for who maintains what.

Google Drive Live API: Server Export of Collaboration Document

I have a requirement to build an application with the following features:
Statistical and Source data is presented on simple HTML pages
Some missing Source data can be added from that HTML page ( data will be both exact numerical values and discriptive text )
Some new Source data can be added from those pages
Confirmed and verified data will NOT be editable via the HTML interface
Data is stored and made continuously available via the HTML interface
Periodically the data added/changed from the interface needs to be pulled back into the source data - but in a VERY controlled way. All data changes and submissions will need verification and checking - and some will trigger re-runs of models ( some of which take hours to run ).
In terms of overview architecture I have:
Large DB that stores and manages the data - this is designed for import process's and analysis. It is not ideal for web presentation or interface
Code servers that manipulate the data for imports and analysis
Frontend server that works as a proxy to add layer of security to S3
Collection of generated html files on S3 presenting the data required
Before reading about the Google Drive Realtime API my rough plan was to simply serialize data from the HTML interface and post to S3. The import server scripts would then check for new information, grab it, check it, log it and process it into the main data set.
That basic process however would mean that once changes were submitted from the web page - they would be lost from the users view until they had been processed by the backend.
With the Google Drive Realtime API it would appear I could get the best of both worlds.
However for the above to work I would need to be able to access the Collaboration Document in code from the code servers and export the data.
The Realtime API gives javascript access to Export and hand off to a function - however in my use case I want to automate the Export from the Collaboration Document.
The Google Drive SDK does not as far as I can see give any hints on downloading/exporting a file of type "Collaboration File".
What "non-browser-user" triggered methods are there for interfacing with the Collaboration Documents and exporting them?
David
Server-side export is not supported right now. What you could do is save the realtime model to a regular drive file, and read from that using the standard Drive API. See https://developers.google.com/drive/realtime/models-files for some discussion on different ways to setup interactions between realtime models and Drive Files.

HTML5 webstorage

Is it possible to give an offline HTML5 form to the end user like Excel file using storage mechanism?. So that he can fill the form and upload it to the J2ee based site.
In Excel this is possible. User can download the format of excel file, fills it and upload to the website. Later in servlet we read the contents and store them in DB.
Is same thing possible with HTML5?
Yes it is. You have to build the application as an offline application. While the user is updating is data, you serialize it to the webstorage. As soon as you have internet connectivity, just send the serialized that to the server to be parsed.