How do I redirect Crowdflower users to my website? - crowdflower

I have a specific problem: I would like to create a Crowdflower job, in which the participant will be redirected to my website (let's say http://www.xxx.yy), where he will complete the task and after he finishes, he'll be redirected back to Crowdflower and paid. Is it possible to do something like that?
I imagined they would have an API, where the user would get some token, which would be then sent to my website and after the completion of the job I could simply do some API call to mark the task as finished. However, I can't find anything in their documentation that would do such thing (http://success.crowdflower.com/customer/portal/articles/1288323).
The reason I need to redirect users to my website is that I need more freedom than CML (Crowdflower Markup Language which is used for creating tasks) offers:
I need to be able to embed an swf file
the swf file should be chosen randomly from an aray of files (approx. 10)
I need to be able to measure how long s/he spends on the website and act based on that time
store some data into a database
All these things can be done pretty easily using some Javascript and PHP, but i don't think they can be done in CML, that's why I need to redirect them to my website.
Can you, please, give me some advice how to do this?

Related

Monitoring a chatroom with Chrome extensions

I want to monitor a chatroom with a selfwritten chrome extension. Because I don't know anything about the scripts behind the chatroom system itself, I thought about a simple timer and export script.
My idea is a periodical timer (let's say every second, because it has to react as fast as possible) calling a function, which reads the complete HTML of the current tab (with chrome.pageCapture.saveAsMHTML) and sends the hole HTML to an external REST service (via XMLHttpRequest()).
I know that this approach is very ressource consuming, but that doesn't matter as all this will run on a dedicated computer. Of course I thought about using chrome.webRequest.onCompleted to trigger the export but, as already mentioned, i have no idea about the technical interna of the chatroom.
Unfortunately I can't find any API to create a time base on seconds, but only on minutes (chrome.alarms.create). Or is there a more elegant way to do this job?
Any hints appreciated.
More elegant way would be to use a MutationObserver, at least as a source of a "there are some changes" event. But maybe the chat is implemented in such way that getting the changes (and then sending only changes, not the whole page) will be convenient too.

Counting views of any element on website

I am using such MySQL request for measuring views count
UPDATE content SET views=views+1 WHERE id='$id'
For example if I want to check how many times some single page has been viewed I've just putting it on top of page code. Unfortunately I always receiving about 5-10x bigger amount than results in Google Analytics.
If I am correct one refresh should increase value in my data base about +1. Doesn't "Views" in Google Analytics works in the same way?
If e.g. Google Analytics provides me that single page has been viewed 100x times and my data base says it was e.g. 450x times. How such simple request could generate additional 350 views? And I don't mean visits or unique visits. Just regular views.
Is it possible that Google Analytics interprates such data in a little bit different way and my data base result is correct?
There are quite a few reasons why this could be occurring. The most usual culprit is bots and spiders. As soon as you use a third-party API like Google Analytics, or Facebook's API, you'll get their bots making hits to your page.
You need to examine each request in more detail. The user agent is a good place to start, although I do recommend researching this area further - discriminating between human and bot traffic is quite a deep subject.
In Google Analytics the data is provided by the user, for example:
A user view a page on your domain, now he is on charge to comunicate to Google The PageView, if something fails in the road, the data will no be included in the reports.
In the other case , the SQL sistem that you have is a Log Based Analytic, the data is collected by your system reducing the data collection failures.
If we see this in that way, that means taht some data can be missed with the slow conections and users that dont execute javascriopt (Adbloquers or bots), or the HTML page is not properly printed***.
Now 5x times more it's a huge discrepancy, in my experiences must be near 8-25% of discrepancy. (tested over transaction level, maybe in Pageview can be more)
What i recomend you is:
Save device, browser information, the ip, and some other metadata information that can be useful and dont forget the timesatmp, so in that way yo can isolate the problem, maybe are robots or users with adblock, in the worst case you code is not properly implemented ( located in the Footer as example)
*** i added this because one time i had a huge discrepancy, but it was a server error, the HTML code was not properly printed showing to the user a empty HTTP. The MYSQL was no so fast to save the information and process the HTML code. I notice it when the effort test (via Screaming frog) showed a lot of 500x errors. ( Wordpress Blog with no cache)

Syncing File Name for Drive Realtime Document

My real-time document allows the user to edit the file name within the editor (much like Google's own apps). I represent this as a collaborative string so all collaborators see the file renames as soon as possible.
I'm trying to determined the best and most efficient way to keep this collaborative string in sync with the actual file name. There are two scenarios to consider:
In Editor Changes
If a user edits the document name within the editor. In this case we need to use the Drive API to push that change out to the file on Google drive. To avoid race conditions, it is best if only one of the collaborators pushes the change out. The easiest way to do this seems to check if the rename event was local.
I also found it best to add a delay so we are not pushing the rename out to the Drive API with every character change. If a few seconds pass with no more name changes at that point it pushes the change out. This all seems to work well.
External Changes
The harder one and the one I am interested in requesting advice on, the case when the file name is changed externally. For example, if the user renamed the file within the Drive interface itself. We want this change to update our collaborative string to match.
My application is entirely client-side so I can't use webhook push notifications. So my only solution is to poll the file name every X seconds (currently set to 10). But this presents the following problems:
It is API intensive. If you have 4 collaborators that keep the screen open for 8 hour that is 11520 API calls. If my app has lots of users with lots of documents I could see how this might push me past my API limits.
To avoid race conditions (and reduce API calls) we only want one collaborator to check for changes and update the collaborative string if the file name has changed. But how to pick when collaborators might join/exit at any time? Currently I am having each collaborator check anytime the collaborators change if they are the "leader". The "leader" is the collaborator whose session id is the highest. This seems to work but it all seems fairly hackey. Also if collaborators join close together I wonder if it might be possible that a race condition would cause multiple collaborators to think they are the leader.
Is there an easier way? An real-time API function I am missing?
It would be ideal if the real-time API just provided a method that stored the document name. Anytime the real-time API checks for mutations it could grab the latest document name.
I think you've identified the options. There isn't any built in functionality currently to sync it via the Realtime API specifically.
Personally I'd probably back off the poll time a lot.. its probably not critical that the title is always exactly up to date, so asking every few minutes is probably sufficient and would greatly reduce your qps.
In terms of identifying a "leader", I can't think of anything better than something deterministic based on the session id. So long as each rechecks on each session join/leave event, I don't think there should be any issues.

Continuously Updating Data Algorithm

There is information on a website that is neatly listed and every so often (not on a set schedule) it updates with new information. I am wanting to write a quick script that will automatically let me know when something new has been updated instead of letting me know every x-amount of house/minutes.
My initial thought was that I would have the script pull all the information I was looking for and store it in a list and on the next scan create a new list and get rid of all duplicates which would leave me with what the new information is.
If you know of an effective way to go about such things and could lead me in the right direction, it would be appreciated.
I'm not really looking for source code, just how to go about it and I'm sure I can put it together after some guidance on an efficient way to do it.
You can poll the website, there is no real way you can register a change listener.
If you are ready to poll then document.lastModified (js) can help.
Of course if your website have rss feeds then you can listen for changes.
or there are some free services that will send you an email notification if you register for their services. things (eg: http://www.followthatpage.com/)

News Aggregater of sorts

There is a website that my company uses that updates information about 3 specific things throughout the day. We use the information from 1 of them and what we are wanting to do is pull this information as it is added to their site and add it to a page of our own to view easier. Is this even possible? Can anyone point me in the direction of setting this up? It is all text that we want to pull.
Pick a language (e.g. Perl). Find an HTTP library for it (e.g. LWP). Fetch the page and run it through an HTTP parser (e.g. HTML::TreeBuilder). Pull out the bits you want and shove them into a template (e.g. TT) then dump to a file. Stick the program in cron or Windows Scheduler.