In the google drive search api, how to group words into a phrase? - google-drive-api

I'm using the Google Drive search api with the Files.list - search for files.
I have a query like : fullText contains 'battle of hastings'.
I'm getting results that seem to suggest that it searches for the individual words, rather than the phrase as a whole. I'm not completely clear though, and am relating the API's functionality to what can be done on a Google Search via the website, so please correct me there.
Anyway, I really only want results for the whole phrase - ie like surrounding a phrase in Google's Search web site with double quotes. For example, if you use Google's search web site to search for "no one will have written this before", then it says 'No results found for "no one will have written this before".', but if you don't use double quotes, then you get all sort of stuff.
To summarise:
Does the query api search for individual words and only return files with all those words in, even if they're not as a phrase, or in that order?
Is there a way to make it consider the words as a single phrase?

By using the try section of Files:List and consulting the search parameter documentation.
fullText - Full text of the file including title, description, and content.
contains - The content of one string is present in the other.
I tested using this
fullText contains 'daimto twitter'
It returned all of the files that contain that exact match.

By using the try it facility, I found that the behaviour is similar to the search ui in google drive, and you need to surround phrases that are to be considered one with double quotes. The quotes should be encoded into the URL like this :
https://www.googleapis.com/drive/v2/files?maxResults=100&
q=fullText+contains+'%22flippity+floppity%22'
I'm not sure if the spaces need to be encoded like that, but I tried to emulate it as much as possible.

Related

How do I extract a http link from a text based column in Mysql using sql

NOTE: I changed SqlLite to MySQL.
I need to extract html links from a MySQL database text field.
The link is not in a fixed position, can be anywhere in the text field.
I like the link to be put in a separate column as part of the output.
Example text.
I’m listening to "some nice music". You can listen to it here: https://example.com/?l=sdfsafyNjE
I’m listening to https://example.com/?l=Njksdfa1 with the app.
Thanks so much in advance.
You can use REGEXP_SUBSTR(column_name, 'https?:\/\/.*?(?=\s)') AS url
Read more about using regular expressions in the documentation:
https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr
You can also do a search for url regex to find an expression suitable to your needs. Urls may require more advanced patterns than this example.

MediaWiki API: search for pages in specific namespace, containing a substring (NOT a prefix)

I want to scrape pages from a list of Wikipedia Categories, for which there isn't a 'mother category'. In this case, dishes -- I want to get a list of all of the pages like Category:Vegetable Dishes, Category: Italian Dishes, then scrape and tag the pages in them. I know how to search for pages in a known category, but there are hundreds of categories containing the substring dishes + it feels like it should be easy to list them.
However, mediaWiki allcategories search seems to only allow search by prefix (e.g. from and to results), and while old opensearch documentation still allows search by substring, this is no longer supported. (see updated API docs + it also doesn't work if I try it)
This is very doable in the wikipedia browser, to the point where I think it might be quicker to just scrape search results, but I wonder if I'm missing something?
Thanks to #Tgr, for pointing out that I'd missed the regular search API, which allows for both a text search, specified namespace, and so on.
The correct query for my instance is:
curl "https://en.wikipedia.org/w/api.php?action=query&list=search&srnamespace=14&srsearch=Dishes&format=json"
thanks!

DNN database search and replace tool

I have a DNN (9.3.x) website with CKEditor, 2sxc etc installed.
Now old URLs need to be changed into new URLs because the domain name changed. Does anyone know a tool for searching & replacing URLs in a database of DNN?
I tried "DNN Search and Replace Tool" by Evotiva, but it goes only through native DNN database-tables, leaving 2sxc and other plugin /modules tables untouched.
Besides that, there are data in JSON-format in database-tables of 2sxc, also containing old URLs.
I'm pretty sure that the Evotiva tool can be configured to search and replace in ANY table in the DNN database.
"Easy configuration of the search targets (table/column pairs. Just point and click to add/remove items. The 'Available Targets' can be sorted, filtered, and by default all 'textual' columns of 250 characters or more are included as possible targets."
It's still a text search.
As a comment, you should be trying to use relative URLs and let DNN handle the domain name part..
I believe the Engage F3 module will search Text/HTML modules for replacement strings, but it's open-source, so you could potentially extend it to inspect additional tables.

Extract HTML Tables from Multiple webpages Using R

Hi I have done thorough research and have come to this extent. All I am trying to do is extract HTML table spanning many webpages.
I have to query the website sec.gov's database and the table then returns appropriate number of results (the size and number of pages vary with every query). For example:
Link: http://www.sec.gov/cgi-bin/srch-edgar
Inputs to be given:
Enter a Search string box: form-type=(8-k) AND filing-date=20140523
Start: 2014
End: 2014
How can I do this totally in R without even opening the browser?
I am sharing what all I have done
I tried many packages and the closest I came to was package RCurl. But in getURL function I opened the browser, ran the query in browser and pasted it in getURL. It returned a very long character, which has the URLs that can be looped and produce the output I want. All this information is in the "center" tag of output.
Now I do not know how to get those URLs out from the middle of the character.
Also, this is not what I wanted. I wanted to run a web query directly from R and get the varied HTML table outputs directly into R. is this possible at all?
Thanks
Meena
Yes, it is possible. You will want to use a combination of the RCurl and XML packages. You will need to programmatically generate the query parameters in the URL (based on the HTML form) and then use getURL() or getURLContent(). Sometimes, the server will expect an HTTP POST, so there is postForm().
To parse the result, look up the XPath language, which the XML package supports with getNodeSet(). I think there is also a function in the XML package for parsing an HTML table into a data.frame.
You might want to invest in this book.

How to search a word in a html file without any java coding?

I'm doing a project in Java which creates a user manual (html files that are linked together like Windows "Help and support centre") of software. Now once a user manual is created I have only html files remaining. Now I want to search html file that contains specified keyword(Search Engine).How can I do this without Java code??
grep, find, python script, or open any file with a text editor and try edit->search
(on windows use windows search in file)
If all of your other code is written in java, then it'll be sensible (without knowing your usecase) to use java for searching as well. You might of course use some commandline programs as grep or find - or built in search functionality in a webbrowser, but if the search should be part of a java application anyway, why not go for java and e.g. Lucene?
If this 'help' is going to be online than you can embed google search in it (limiting the search results to specified site:). Alternatively if you're hosting the pages yourself you can use htdig for indexing the pages.
However if it's going to offilne you'll be better of by generating a static index page with links to topics. In order to create a more help-system-alike user experience you can hide the contents of the index in the invisible html DIV tags and add a JavaScript that takes searched phrase as an input and that unhides the matched words with their links.
Maybe I'm missing something, but have you looked at javahelp? It has indexing and searching built in, and can be used online or offline.