How to retrieve information about shown ads - google-chrome

I would like to retrieve information about shown ads on any website, like for example this website you're reading. I would like to retrieve information like
Where does the ad come from (AdSense, DFP, etc)
What is the targetted (destination) URL
What is the path of redirects untill destination (if any)
Is there any software or extensions for Chrome or FireFox out there? I'm bassically looking for something does the opposite of the Chrome extension "AdBlocker Plus"

you can either iterate over all frame objects, but this is not so accurate because what if an ad is in html
but then what if the add is in the html and they added it in the php.
i think your answer is to use observer service for http-on-modify-response
change http-on-examine-response to http-on-modify-response in this code here:
https://github.com/Noitidart/loadcontext-and-http-modify-observer
when you change to http-on-modify-response you will see every redirect. if you don't, and keep http-on-exmaine-response you will see just the final location of redirect
and you can watch every single thing that loads if it matches a list of ad url's then you know

Related

What is the difference between these URL syntax?

I was sent a hyperlink to a Tableau Public link by a client. When I tried opening it, I got a 404 exception. I wrote back to the client but was told by the same that the link was working fine. I visited his profile page and was able to open the presentation there, but the URL that ended up working was slightly different than the one behind the original, non-functioning link.
Here's the anonymized URL behind the original link
https://public.tableau.com/profile/[client_name]%23!/vizhome/Project-AirportDelay/FlightPerformancesinUSA?publish=yes
And here's the URL via the profile page:
https://public.tableau.com/profile/[client_name]#!/vizhome/Project-AirportDelay/FlightPerformancesinUSA
The only differences I see are ?publish=yes and %23!. I tried appending the former, ?publish=yes, to the working URL, and it was still functional. So I suspect that it has to do with the other difference %23! vs. #!. Could the first work because he is opening it from his computer where he is likely logged onto Tableau Public? What's the difference between these syntax? Any ideas about why the original hyperlink might not be functional?
For obvious privacy reasons, I can't provide the whole URL.
It looks like the basic URL pattern for passing filters ?publish=yes
and
%23 is the URL encoded representation of #
The first # after the authority component starts the fragment component. If the # should be part of the path component or the query component, it has to be percent-encoded as %23.
As # is a reserved character, these URIs aren’t equivalent:
http://example.com/foo#bar
http://example.com/foo%23bar
There are countless ways how a URI reference could become erroneous. The culprit is often a software, like a word processor, where someone pastes the correct URI, and the software incorrectly percent-encodes it (maybe assuming that the user didn’t paste the real/correct URI).
Copy-pasting the URI from the browser address bar into a plain text document should always work correctly.

What additional data to send in URL request?

Short version: How do I know how to phrase additional data (like specific options on the page that display different html files but belong to the same URL) when getting an URL with urllib?
Long version:
I am having trouble to figure out how to handle properties of an url request that are not determined by the Link URL but by probably other information that your browser is usually sending.
To be more precise:
This page contains a table that i want to read with python, but the length of the table depends on the number of items per page you choose in the bottom left (i.e. the number of items in the code I get from urllib.request.urlopen is the standard of 50 or something, not the complete table). Clicking the buttons for e.g. 400 items per page doesn't change the URL so I expect that there is some information sent somewhere else. I understand that using urllib can send additional data besides the url, but it is unclear to me how to figure out how I should phrase the "give me the whole table" (or "give me 400 items per page") in that data.
Studying the .html file I get from saving the webpage in my browser didn't give me any hints and I miss the vocabulary to search for answers on the web (that is, googling "urllib request parameter" is too vague).
Hence I'd be completely satisfied if someone would point me to a duplicate of this question.
Thanks in advance :)
For everyone else finding this question I'll elaborate on the answer #deceze gave in the comments:
Open the webpage you want to read in your browser
Open your Browsers network panel (in chromium this is [Strg+Shift+I] or right-click > Inspect
Go to the "Network" Tab (at least in chromium)
Do whatever you want your program to do and the empty network panel list will fill with a lot of data
Find your request in the list of events (one of the very first ones is right, I would guess), click it and select "Headers"

Disable google search in Chrome for custom TLDs

We are working with a custom top level domain per developer which points to it's localhost, so internally we would have url's like this
companywebsite.com.john
companywebsite.com.mike
subdomain1.companywebsite.com.john
..
subdomain99.companywebsite.com.john
The problem is that every time when we need to access a sub-domain for the first time(and we have a lot of them) chrome always redirects to a google search, then after a few attempts, asks if we actually mean
http://subdomain99.companywebsite.com.john/
So, my question is, how could I set up chrome to always parse that kind of url as a url not a search.
Thank's
In the end this is what we ended up doing:
1. go to **Settings**
2. in Search section click on **Manage search engines**
3. Scroll down to the bottom until you find an empty entry
4. Fill in these values: (noSearch, null, http://%s)
5. Scroll back up in the list, find it and click **Make default**
For my case I found an easy solution by chance, add a forawrd slash / at the end of your url and chrome will revert to url, in your example: companywebsite.com.john/
Check the screenshots below:
Before you add slash
when you add the slash at the end, Chrome will translate it to a url

HTML5 speech input and Google Translate text-to-speech, problem in Chrome

I'm creating a voice/text-memo web application.
Here: http://gustavstromberg.se/sandbox/html5/localstorage/ look at its source (very short, most of it is css)
This is:
Voice recognition, works only in chrome as far as I know.
Local Storage, to store notes as text.
Google Translate text-to-speech.
Everything works, but in different browsers. The voice input works perfect, only in chrome. The text-to-speech works in safari.
To dynamically load the memo into the audio>source element i use:
$("#spokenmemory").html("<source src='http://translate.google.com/translate_tts?tl=en&q="+localStorage['memory']+"' />");
(the localStorage['memory'] contains my stored text memo)
To play my recently saved memo with googles text-to-speech-function I use:
$("#listenplay").click(function(){
$("#spokenmemory")[0].play();
});
(spokenmemory is the id-attribute of my audio-tag)
This does'nt work to play in chrome, but if I visit the translation link (example: http://translate.google.com/translate_tts?tl=en&q=Japan, and my text-memo is "Japan") in a separate browser-tab and then return to my site and reload the page (with the same text-memo "Japan" saved), the playback works. How strange, and annying!
Has anyone any idea of this strange behaviour?
It is because google restricts certain types of requests to prevent the service from being overloaded. So the audio file is not fetched when your browser tries to fetch it. Once you visit the translation link, the audio file is fetched and is cached which is why the playback works (provided the text-memo is same). This has been my observation but I'm not very sure.
When I used CURL to fetch the file, this is what I got in response:
403. That’s an error.Your client does not have permission to get URL /translate_tts?q=hello from this server.
I tried hard Gustav, and this is what I found after a bit of research and testing.
It seems Chrome is having trouble streaming mp3 (the format google returns). The only solution I can imagine is getting the file (cURL?) to your server and then present it to the user. I assume when Google releases the official API, there will also be some sort of a format option.
http://code.google.com/p/chromium/issues/detail?id=45152
http://www.trygve-lie.com/blog/entry/html_5_audio_element_and (yes, the play button is the same color as the background, funky)

REST/Ajax deep linking compatibility - Anchor tags vs query string

So I'm working on a web app, and I want to filter search results.
A nice restful implementation might look like this:
1. mysite.com/clothes/men/hats+scarfs
But lets say we want to ajax up the filtering, like the cool kids, and we want to retain deep linking, we might use the anchor tag and parse that with Javascript to show the correct listings:
2. mysite.com/clothes#/men/hats+scarfs
However, if someone clicks the first link with JS enabled, and then changes filters, we might get:
3. mysite.com/clothes/men/hats+scarfs#/women/shoes
Urk.
Similarly, if someone does not have JS enabled, and clicks link 2 - JS will not parse the options and the correct listings will not be shown.
Are Ajax deep links and non-Ajax links incompatible? It would seem so, as servers cannot parse the # part of a url, since it is not sent to the server.
There's a monkeywrench being thrown into this issue by Google: A proposal for making Ajax crawlable. Google is including recommendations for url structure there that may give you ideas for your own application.
Here's the wrapup:
In summary, starting with a stateful
URL such as
http://example.com/dictionary.html#AJAX
, it could be available to both
crawlers and users as
http://example.com/dictionary.html#!AJAX
which could be crawled as
http://example.com/dictionary.html?_escaped_fragment_=AJAX
which in turn would be shown to users
and accessed as
http://example.com/dictionary.html#!AJAX
View Google's Presentation here (note: google docs presentation)
In general I think it's useful to simply turn off JavaScript and CSS entirely and browse your website and web application and see what ends up getting exposed. Once you get a sense of what's visible, you will understand what most search engines see and that in turn will show you what is and is not getting spidered.
If you go to mysite.com/clothes/men/hats+scarfs with JavaScript enabled then your JavaScript should automatically rewrite that to mysite.com/clothes#men/hats+scarfs - when you click on a filter, they should be controlled by JavaScript meaning you'll only change the hashtag rather than the entire URL (as you're going to have return false anyway).
The problem you have is for non-JS users going to your JS enabled deeplinks as the server can't determine that stuff. Unfortunately, the only thing you can do is take them to mysite.com/clothes and make them start their journey again (as far as I'm aware). You'll need to try and ensure that when people link to the site, they use the hardcoded deeplink rather than the hashed deeplink
I don't recommend ever using the query string as you are sending data back to the server without direct relevance to the prior specified destination. That is a corruptible security hole as malicious code can be manually added to the query string to cause a XSS or buffer overflow attack at your webserver.
I believe REST was intended to work with absolute URIs without a query string, because then your specifying only a location of a resource and it is that location that is descriptive and semantically relevant in addition to the possibility of the resource being so equally relevant. Even if there is no resource at the specified path you have still instantiated a potentially unique and descriptive location that can be processed accordingly.
Users entering the site via deep links
Nonsensical links (like /clothes/men/hats#women/shoes) can be avoided if you construct your Ajax initialisation code in such a way that users who enter the site on filtered pages (e.g. /clothes/women/shoes) are taken to the /clothes page before any Ajax filtering happens. For example, you might do something like this (using jQuery):
$("a.filter")
.each(function() {
var href = $(this).attr("href").replace("/clothes/", "/clothes#");
$(this).attr("href", href);
})
.click(function() {
update_filter($(this).attr("href").split("#")[1]);
});
Users without JavaScript
As you said in the question, there's no way for the server to know about the URL fragment so filtering would not be applied for users without JavaScript enabled if they were given a link to /clothes#filter.
However, even without filtering, these links could be made more meaningful for non-JS users by using the filter strings as IDs in your /clothes page. To prevent this messing with the Ajax experience the IDs would need to be changed (or the elements removed) with JavaScript before the Ajax links were initialised.
How practical this is depends on how many categories you have and what your /clothes page contains.