How to integrate chrome browser built in tools (specifically "find") when trying to develop an extension? - google-chrome

I am trying to develop a chrome extension that part of it will need to have the global find keyword functionality, just like the built in "Find" (Ctrl+F) that comes with the browser. (EDIT: It needs to invoke "Find" multiple times and concurrently on the same tab)
My first thought is to find an API that can provide the "find" functionality from Chrome. However, after going through the list, I don't see what I am looking for. Also, the keywords for my question ("Chrome extension", "Chrome API","find","search") are too generic, I can not find similar examples or information for such an API even after extensive googling.
In order to provide consistent user experience I would love to provide similar, if not exact, "Find" tool in my extension. In order to avoid reinventing the wheel, it would be best if I can somehow invoke the built-in function. Existing extensions are mostly an own implementation in Javascript with limitations (cannot search inside iframe, do not have global highlight, etc.) This will be my last resort.
Does anyone know of such an API(that will invoke built-in "find" tool from the browser) or a similar example to my question? If not, please let me know what's the best way to implement it in javascript, as I am new to lexical analysis or parsing.
Many Thanks!!
-Gavin
P.S: This is my first post here, if I haven't given enough information on my question (or you don't think this is a question at all), feel free to let me know!
EDIT2: I am trying to build an improvement extension based on "Find" that can solve this scenario:
In a text-intense page, if I want to locate a region where it mentions keywordA and keywordB but these two keyword are not immediately adjacent to each other and both of them appear many times in the document. In this case I can neither search "keywordA keywordB" (because they are not next to each other) nor individual keywords (too many occurrence).
For example, in an html-based math textbook, you want to locate a chapter that mentions "linear algebra" and "matrix" together the most times.

The built in search does not support multiple concurrent invocations on the same tab. So even if it becomes accessible via API some day, it's unlikely to support concurrency, because concurrent searches are not natural for general use case with a single interactive user, and involves UI. One thing that I can imagine as improvement for existing built in search is support of a query language, which will allow searching for alternatives (i.e. car | auto for OR-ing), and this would somehow address "multiple" part of your requirement.
Your option is to use a content script for searching text, for example (with jQuery):
var search_i = $('*:contains("text to find")');
This way you can perform and combine as much searches as you want, but you'll need to design a proper (understandable) UI, presenting results for every search without interference with other searches.

Related

Are google's search results influenced by our data?

I have always wondered that.
For example, If I search for the term "composer" or "what is composer", it shows the php package manager. Why does it show programmer-related results? Obviously, it makes sense that it does that, since the results I get are much more relevant to me.
What if an aspiring composer googles that? What results will they get?
Another example is, if I enter the word "spring" to the search engine, it shows the spring framework, instead of, let's say, the season.
So, my question(s):
Does google actually use the data it collects to show relevant search results? (I am not talking about ads, but search results)
If yes, why doesn't incognito mode work?
How can I avoid google using other parameters, besides the very term I typed in, to affect the search results?
Yes. This is the very core of Google's business model. The same data that influences search results is also applied to ad placement (see their real-time bidding system); when you do searches, it's likely you will see ads about the same subjects fairly soon afterwards.
Incognito mode is a very limited form of anonymisation; it's really not very anonymous at all. If you visit a page in a browser that has some google-controlled element (e.g. Google Analytics, a CDN JS library, or a font), then shortly afterwards perform a google search, there will be very many points in common that allow google to match you as very likely the same person (e.g. your IP, time of day, recent similar requests, user agent string, window size, fonts available) even if it blocks cookies that would identify you explicitly. This form of fingerprinting is quite hard to avoid, though Safari is a lot better at it than Chrome. Tor provides much more robust anonymisation by normalising many fingerprintable elements, as well as hiding your IP.
That's difficult because making use of all this information will indeed lead to generally more relevant search results, so it's in Google's interests to use whatever it can (within technical and mostly legal limits). Tor will disconnect the search results from you, but it may instead provide you with results linked to whoever else might have been using the same Tor exit node as you recently, which might not be pleasant! The same would apply to using VPN services.

REST And Can Delete/Update etc actions

I am writing a REST API. However, one of the requirements is to allow the caller to determine if an action may be performed (so that, for example, a button can be enabled or disabled, etc.)
The action might not be allowed for several reasons - perhaps user permissions, but possibly because, for example, you can't delete a shared object, or you can't create an item with the same name as another item or an array of other business rules.
All the logic to determine if something can be deleted should be determined in the back end, but the front end must show this in the GUI.
I am trying to find the right pattern to use for this in REST, and am coming up a bit short. I could create a parallel API so for every entity endpoint there was an EntityPermissions endpoint, but that seems to be overkill. I could also do something like add an HTTP header that indicates that the request was only to check permisisons, not perform it, but that seems a bit dubious, and likely to mess up the http cache.
Can anyone point me to the common pattern for doing something like this? Does it have a name? Or a web page that discusses it? I'm sure everyone has their own ideas on this (like my dumb ideas) but I this seems to be a common enough requirement that I figure there must be a common pattern for it. But google didn't help much.
There's going to be multiple opinionated answers about this. I'll share mine. Might not be the best for your problem, but it's a valid solutions.
If you followed the real definition of REST, you would be building a hypermedia/HATEOAS-style webservice. Urls would not be hardcoded, they would be discovered and actions would be discovered by the existence of a link.
If an action may not be performed, you can just hide the link. If a user fetches the next resource they just see all the available actions right there.
A popular format for hypermedia API's is HAL. You might decorate the links further with more information from HTTP Link hints.
If this is the first time you heard of hypermedia API's, there might be a bit of a learning curve. The results of learning this can be very beneficial though.

Scribd API search showing irrelevant answers

When I use the search functionality on the scribd docs API to search for a function, like
http://api.scribd.com/api?method=docs.search&api_key=API_KEY&query=hello+world
It returns irrelevant results, and ones different to the search functionality of the site. This request, for example, returns results about Guitar Hero, World of Warcraft and Virtual Worlds etc. Whereas the site search on https://www.scribd.com/search-documents?query=hello+world gives documents titled "Hello World" as you would expect. Is there a parameter that I can add to the api call that will make it return relevant results?
You may try playing with the simple parameter to see if it makes any difference to your queries. According to the API reference (half of it is inaccessible at the moment) it makes the results the same as for the website:
(optional)This option specifies whether or not to allow advanced search queries (more information). When set to false, the API search behaves the same as the search on Scribd.com. When set to true, the API search allows advanced queries that contain filters such as title:"A Tale of Two Cities". Set to "true" by default.
I tried your query myself, but it still doesn't give adequate results, even though it changes things a bit. But it is still not good enough regardless of the simple option being set to false. Even if you try to run their sample queries 1:1 they are still giving 90% irrelevant results.
Then I found a similar issue being discussed in the following google group thread back in 2011. At the end Jared Friedman (the CTO of Scribd) himself admits that API search and website Search work differently and it is not in their priorities to fix this. In 2014 another developer complained. Seems to me that four years later this is still the case.
I'd suggest contacting Scribd support directly and asking them what is the current status of the docs.search API and if there is some preliminary approval process in place (for example, they may do a background check on accounts and only then provide relevant results, otherwise they return just test results for any query) although I doubt it.

Google searches with permanent filters

I'm wondering if there's a way to make google searches where you can set filters you want to be in effect permanently - like a filter profile. So, for instance, every time you would do a search, you could get results that didn't include say, Yahoo Answers, without having to type in -yahoo -answers.
A feature like this would be invaluable because it's very common to perform a search and want to filter out a lot of popular sites that would normally top the rankings. For example, suppose you're searching for a news topic and don't want to read mainstream media articles. You could add the words reuters, cnn, huffington post, daily mail, and so on to your filter profile and never see those sites turn up in any of your searches ever again.
I'm asking because I'm interesting in making an extension that would do precisely this, but there's no point if such a feature already exists.
You can create a Custom Search in minutes. It's called Google CSE (Custom Search Engine)
This is a sample public link that I've created based on your example above: https://www.google.com/cse/publicurl?cx=006201654654568968489:1kv4asuwfvs
In the settings:
I can choose to exclude by url, url pattern, or even urls within my search results
If you need more ways, here's a good and relevant link.
Search filters can be specified as part of the URL (e.g. append site:example.com/section1 to a Google query to only yield results whose locations start with that prefix). So you can make a search plugin that substitutes your query into such a template and install it into your browser.
Search plugins are generally XML files with a standardized schema. OpenSearch is one such standard supported by Chrome.
There are sites that host collections of user-submitted plugins as well as tools to generate your own. An example that I use is the Mycroft project (originally created for Apple Sherlock software that pioneered the concept and later accepted into the Mozilla project when Firefox took on the feature).

How can I get started on programmatically analyzing web site content?

I've been looking for a new hobby programming project, and I think it would be interesting to dabble in ways to programmatically gather information from websites and then analyze that data to do things like aggregate or filter it. For example, if I wanted to write an application that could take Craiglist listings and then do something like display only the ones matching a specific city not just a geographical area. That's just a simple example, but you could go as advanced and sophisticated as how Google analyzes a site's content to know how to rank it.
I know next to nothing about that subject and I think it would be fun to learn more about it, or hopefully do a very modest programming project in that topic. My problem is, I know so little that I don't even know how to find more information about the subject.
What are these types of programs called? What are some useful keywords to use when searching on Google? Where can I get some introductory reading material? Are there interesting papers I should read?
All I need is someone to disabuse me of my ignorance, so that I can do some research on my own.
cURL (http://en.wikipedia.org/wiki/CURL) is a good tool to fetch a website's contents and hand it off to a processor.
If you are proficient with a particular language, see if it supports cURL. If not, PHP (php.net) may be a good place to start.
When you have retrieved a website's content via cURL, you can use the language's text processing functionality to parse the data. You can use regular expressions (http://www.regular-expressions.info/) or functions such as PHP's strstr() to find and extract the particular data you seek.
Programs that "scan" other sites are usually called web crawlers or spiders.
I recently completed a project that uses Google Search Appliance that basically crawls the whole .com domain of the web server.
GSA is very powerful tool that pretty much indexes all the urls it encounters and serves the results.
http://code.google.com/apis/searchappliance/documentation/60/xml_reference.html