Track external site with socket.io? - json

Is there any way to track an external site with socket.io? So whenever the website changes then it would automatically send an event to my server?
For example:
socket.on("newconnectionfromxxxxxxxxxxxxxxx", function (websiteChange) { //websiteChange would be the contents of the updated website.
});
Sorry if this questions is obvious because I couldn't find anything online.
EDIT: The site that I want to be tracking is a JSON file.

socket.io works between two cooperating sites. So, it would only help you here if the site you want to monitor specifically supports an incoming socket.io connection AND it supports sending a notification message over that connection whenever the specific file you are interested in changes.
If the site you want to monitor does not have such specific monitoring features, then the best you could do is to regularly download the file of interest and see for yourself if it has changed.

Since all websites are not really fully event-driven/reactive at this time, you'll have to poll your website target periodically and look at that JSON file for differences. There are many "filewatcher" programs (C#, Java, etc) out there that hook into the native OS filesystem event mechanisms; you could alter one of these programs to trigger an WS or HTTP event to your server. There is a NodeJS wrapper that might be helpful for you: https://github.com/paulmillr/chokidar
Bummer the Unix "watch" utility didn't have an option to trigger events when a diff was detected instead of visually highlighting the change. It would be really cool if it has a "-difftrigger 'some-command'" option...

Related

http push django comet

I want to make a django server to refresh the content that you approach the database, if the idea is to first make the user see the current contents of the database and as the valley became the new content, this content comes and is placed above the previous content without reloading the page, in another part of the site is to make you change the current content with the new as it gets to the database?
evserver clearer is my choice, but really do not know how and what would be the most simple and efficient?
I think you should avoid HTTP Polling. Here's why:
if the frequency of the setInterval combined with the number of users on your web app is going to lead to a big resource drain. If you go through slides 9 to 19 in this presentation you'll see some quite dramatic figures for using Push (Note: this example uses a hosted service but hosting your own realtime server and using Push also has similar benefits)
between setInterval calls the data displayed in your app is potentially out of data. Using a Push technology means the instant that new data is available it can be push and displayed in your app. You don't want users looking at an app and thinking they are seeing correct information when they are not.
You should take a the following StackOverflow questions:
Django / Comet (Push): Least of all evils?
Need help understanding Comet in Python (with Django)
For Python/Comet see:
Python Comet Server
The latest recommendation for Comet in Python?
I'd recommend you also start considering "WebSockets" as well as "Comet". Most Comet servers now prefer to use a WebSocket connection when possible.
If you'd prefer to avoid installing and managing your own Comet/WebSocket solution then you could use a realtime hosted service which will allow you Push data through them using a REST API and your clients can receive events by embedding a JavaScript library and writing a small about of code to subscribe and receive the event.
The steps are quite straightforward:
Write a model to store data in DB
Write a view that will generate JSON-serialized data upon POST request.
Write a template that will contain JavaScript with setInterval() that will
proceed AJAX requests to the view and render recieved data. (I'd suggest using JQuery as it's well documented and widespread).

HTML5 - Web sql setting up offline storage

How do I setup the basic switching of offline storage modes (offline/online) in Web SQL? I know there's javascript
window.navigator.onLine. I can check the mode and then go through a process...
//All GET/POST performed with AJAX
//On Startup pulldown entire accessible database into offline storage (Doesn't seem secure IMO)
//if(read) pull from offline
//if(create, update, delete and online) pull from standard db, mark changes with offline expiration flag
//if(create, update, delete and offline) perform operation on offline storage, persist with POST when next online (change flag)
I'm asking if there is any OOB integration for these standard tasks?
The navigator.online property generally isn't very useful - in a desktop browser all it does is hook into the File -> Work Offline menu. It may be more useful on an iPad, I don't know because I don't have one, and I'm guessing there's not a File menu, but I would recommend you test.
A common approach to this issue is to set up two easily distinguishable files in the fallback section of your manifest. Every time you want to connect back to the server attempt to fetch the file with AJAX and, in the callback, check it to see if you got the online file or the fallback, then branch accordingly.
You shouldn't be using Web SQL as that spec was nixed a new months ago. You should be using Localstorage. Unless you are specifically coding for something like the iphone, but even then you dont know how long the spec will be in webkit.

Getting same information firebug can get?

This all goes back to some of my original questions of trying to "index" a webpage. I was originally trying to do it specifically in java but now I'm opening it up to any language.
Before I tried using HTML unit and other methods in java to get the information I needed but wasn't successful.
The information I need to get from a webpage I can very easily find with firebug and I was wondering if there was anyway to duplicate what firebug was doing specifically for my needs. When I open up firebug I go to the NET tab, then to the XHR tab and it shows a constantly updating page with the information the server is updating. Then when I click on the request and look at the response it has the information I need, and this is all without ever refreshing the webpage which is what I am trying to do(not to mention the variables it is outputting do not show up in the html of the webpage)
So can anyone point me in the right direction of how they would go about this?
(I will be putting this information into a mysql database which is why i added it as a tag, still dont know what language would be best to use though)
Edit: These requests on the server are somewhat random and although it shows the url that they come from when I try to visit the url in firefox it comes up trying to open something called application/jos
Jon, I am fairly certain that you are confusing several technologies here, and the simple answer is that it doesn't work like that. Firebug works specifically because it runs as part of the browser, and (as far as I am aware) runs under a more permissive set of instructions than a JavaScript script embedded in a page.
JavaScript is, for the record, different from Java.
If you are trying to log AJAX calls, your best bet is for the serverside application to log the invoking IP, useragent, cookies, and complete URI to your database on receipt. It will be far better than any clientside solution.
On a note more related to your question, it is not good practice to assume that everyone has read other questions you have posted. Generally speaking, "we" have not. "We" is in quotes because, well, you know. :) It also wouldn't hurt for you to go back and accept a few answers to questions you've asked.
So, the problem is?:
With someone else's web-page, hosted on someone else's server, you want to extract select information?
Using cURL, Python, Java, etc. is too painful because the data is continually updating via AJAX (requires a JS interpreter)?
Plain jQuery or iFrame intercepts will not work because of XSS security.
Ditto, a bookmarklet -- which has the added disadvantage of needing to be manually triggered every time.
If that's all correct, then there are 3 other approaches:
Develop a browser plugin... More difficult, but has the power to do everything in one package.
Develop a userscript. This is much easier to do and technologies such as Greasemonkey deal with the XSS problem.
Use a browser macro technology such as Chickenfoot. These all have plusses and minuses -- which I won't get into.
Using Greasemonkey:
Depending on the site, this can be quite easy.   The big drawback, if you want to record data, is that you need your own web-server and web-application. But this server can be locally hosted on an XAMPP stack, or whatever web-application technology you're comfortable with.
Sample code that intercepts a page's AJAX data is at: Using Greasemonkey and jQuery to intercept JSON/AJAX data from a page, and process it.
Note that if the target page does NOT use jQuery, the library in use (if any) usually has similar intercept capabilities. Or, listening for DOMSubtreeModified always works, too.
If you're using a library such as jQuery, you may have an option such as the jQuery ajaxSend and ajaxComplete callbacks. These could post requests to your server to log these events (being careful not to end up in an infinite loop).

Browser, upload large file

I'm looking for a way to allow a user to upload a large file (~1gb) to my unix server using a web page and browser.
There are a lot of examples that illustrate how to do this with a traditional post request, however this doesn't seem like a good idea when the file is this large.
I'm looking for recommendations on the best approach.
Bonus points if the method includes a way of providing progress information to the user.
For now security is not a major concern, as most users who will be using the service can be trusted. We can also assume that the connection between client and host will not be interrupted (or if it is they have to start over).
We can also assume the user is running a browser of supporting most modern features (JavaScript, Flash, etc)
edit
No language requirements. Just looking for the best solution.
There are several ways to handle this,
1. Flash Uploader
Theres plenty of flash uploaders to improve the users GUI so that they can examine the process and the process factors such as time left, KB Done etc.
This is very good if you understand how to improve Flash source code for later developments.
2. Ajax
Theres a few ways using Ajax and PHP (although PHP Does not support it) you can use Perl module to accomplish the same thing http://pecl.php.net/package/uploadprogress, This is only if you wish to show percentage information etc.
3 Basic Javascript.
This method would be just the regular form, but with some ajax styling so when the form is submitted you can show a basic loader saying please wait while you send us the file...
If your using asp, you can take a look at: http://neatupload.codeplex.com/
Hope theres some good information to get you on your way.
Regards
Not sure about your language requirements, but you can look e.g. into
http://pypi.python.org/pypi/gp.fileupload/
Supports progress information also, btw.
I have used the dojo FileUploader widget to reliably upload audio files greater than a gigabyte with a progress bar. Though you said security was not an issue, I'd like to say that I got HTTPS uploads w/cookie based authentication hooked up flawlessly.
See: http://www.sitepen.com/blog/2008/09/02/the-dojo-toolkit-multi-file-uploader/ and
http://api.dojotoolkit.org/jsdoc/1.3/dojox.form.FileUploader

Interfacing with the end-user's scanner from a webapp (web/scanner integration)

Consider the following scanning procedure in a typical document handling webapp:
The user scans a document using a scanner connected to his/her computer
The scanned image is saved locally on the user's computer as a BMP/JPG/TIF/PNG file
The user hits a file upload "Browse.." button in the web application
The user is presented with a file dialog which he/she uses to locate the scanned image
The user hits "Upload image" and the scanned image is uploaded to the server where it is stored
This process is quite complicated and I'd like to reduce the number of steps in order to make the process more user friendly/fool proof. Under ideal circumstances the above steps would be replaced with only one step in which the procedure initiate document scanning, complete document scanning and upload resulting image is automatically triggered from the webapp when clicking say "Scan and upload". Unfortunely it seems like the state of "web/scanner integration" is quite poor so this might be utopia.
How would you tackle this problem? More specifically, how would you go about reducing the number steps involve in the use-case described?
Well, two years have passed, so here's an update on the state of the art for those just joining us.
Both Dynamsoft and Atalasoft have multi-browser web-scanning toolkits which are compatible with any server-side stack. Both require the user to install an ActiveX (in IE) or an NPAPI plugin (Chrome, Firefox, etc.) to get access to the scanner via the TWAIN API.
Obviously if you have the time or a limited budget, you can create your own plugin. I heartily recommend the FireBreath plugin framework, and any TWAIN library rather than writing your own TWAIN code.
Once the ActiveX or plugin is installed, the rest of the work is a combination of javascript & HTML on the client, and some kind of handler on the server to accept and process the incoming image, which can be made to look just like a multipart form submit with an attached file.
I recommend doing the image upload in javascript using AJAX, because it is then part of the same browser 'session' as the web page, and it inherits the browser's proxy settings, session cookies and server-side authentication. I don't know about Dynamsoft's control, the Atalasoft toolkit includes such AJAX uploading. The image(s) are handed from the plugin to the javascript as a base64-encoded string, so no local file is actually created.
Disclaimer: I work on Atalasoft's WingScan web-scanning toolkit.
If your target audience is running Windows and IE, and you don't mind spending a few $$, Atalasoft has some components that will do just what you're looking for.
I actually saw someone at the bank do this while setting up my account and I was totally amazed. Bank in question was using Windows and IE, I assume your in an equally controlled environment. I think the bank used a combination of a custom/ predictable scanner driver and an ActiveX control.
A page loaded which said "Open the scanner" the staff member popped the document in and hit Scan on the webpage, then the page changed to say Scanning, then it showed the scanned document on the web page for the staff member to Approve. I can only assume that the scanner driver send the image to a certain location and the active X control was polling for it to appear, once it appeared it showed the image on screen, once the staff member had approved it the active x uploaded it in the background. She opened the next page and carried on with the rest of the process.
God knows how they made all that tech work but it can be done.
Silverlight 4 is coming out soon. It is supposed to have the ability to interact with COM objects on the user's computer (provided they are running Windows). In theory you call WIA methods from your Silverlight web page.
We implemented a solution to implement Remote Deposit for a bank. It works only in IE. A winforms dll was created that interfaces with LeadTools TWAIN dll. Leadtools TWAIN dll abstracts all the TWAIN minutae. This approach is slighly better than using an ActiveX control. .NET Framework would be needed on client. The scanned images are posted back to a hidden variable on the page and are processed on the server.
Hmm, I've always wanted to look at a scanned file before I did anything with it, but I suppose that depends on your scanner and how much quality you need.
If the goal is to "automate the scanning and uploading process" as opposed to "write a web app", I'd write an AutoIt script to control the existing scanner software and a simple ftp program.
The option most likely to remove the most steps, would probably be writing a customized scan utility that the user would download and run on their local machine.
SANE or TWAIN would handle getting the scanned image. cURL could than handle uploading the image to your web app. To make things even easier for the end user, I would use something like a Comet connection to update the web page when the file was available.
If that isn't an option, you might look into seeing what options your users will likely have using their scanners software. I believe many programs now support scanning to email or ftp.
The solution I have used for an intranet app, using multifunction scanner/copiers was to scan to an SMB share that the web server had access to. The user just goes to the copier scans to the share and when they get back to their desk, they go to the new scans page which shows a list of all the new unprocessed files.
Since your audience is controlled environment, You can write your own browser extension/program based on WIA/TWAIN that does the scanning. If you choose browser extensions such as BHO/ActiveX/XPCOM, etc, you need get the user's permission to install your extension. If you choose to write a program you may need web deployment technologies like ClickOnce or Java Web Start to be launched from web.
Interfacing TWAIN is a pain on Windows. Complexity aside, you have to display some GUI written by different scanner driver developers. It may be the only way to support old scanners or features not exposed via other interfaces like full-speed multipage scans from a document feeder.
Microsoft's WIA makes interfacing with scanner much easier with a scripting object model, however scanner-specific features are not available and some old scanners do not support the interface.
After scanning you can call a web service to notify the server and the web page can refresh periodically to check new images.
We have done something similar. we used a command-line TWAIN program (http://www.burrotech.com/quickscan.php). $$ $49
1) We developed a small .Net application to run the QuickScan program as a shell command.
2) The command was assigned to the Scan button.
3) Once the user presses on the scan button, a prompt will appear to enter the file name. The user saves the transaction Id as the file name.
4) Another .Net application (or maybe the same mentioned before) will read this file and upload it into database considering that the filename is the transaction ID.
Worked like a warm knife in butter!
You can try displaying the transaction ID into IE, user to select the ID then presses Scan. Your application will read the SELECTED text and save the file using the SELECTED text as the file name. We havne't tried it but it should work.
It is only utopia if you think that web applications are limited to web browsers, in fact, web applications can include a lot of different technologies, besides HTML and Javascript.
The cool way of solving that problem -- in fact, I already used that for some usbserial devices -- is to implement your application using SOAP+XMPP. You can do that in Perl by using XML::CompileX::Transport::SOAPXMPP, Catalyst::Engine::XMPP2, Catalyst::Controller::SOAP and Catalyst::Model::SOAP.
The interesting thing about using XMPP is that it simplifies the management of addressing, since you use the JID (Jabber ID) to look for the software agent, not some host+port addressing schema. The second interesting part of using XMPP is to more easily support the server pushing information to the client.
But if you don't want to handle XMPP you still can do the same thing with a lightweight embedded http server -- HTTP::Server::Simple, in Perl -- and somehow register the current scanner address in the server so it can call back.
And a last option, which is not so cute, is to have the software agent polling the server to see when there is a "scan document and upload" order for that specific machine and realize that operation when that is present.
In summary, having a local software agent to interact with the local hardware doesn't make your webapp less "web", as long as you use web standards -- like XML, SOAP and others -- to perform that communication.
You can put a Java applet in your website. This can access the scanner and send the data via REST to your web server.