Is it possible to return passages.passage_text in search skill for Watson Assistant - watson-assistant

I'm creating an assistant and want to be able to return relevant passages of text from documents uploaded to Watson Discovery via the Search Skill.
When configuring the search response you are only provided with the option to return the full text and various enriched_text.x extracted_metadata.x etc.
None of this is really that useful (for my use case anyway) and what I really need is to point the user to the relevant text within the documents i.e. return passages.passage_text
Am I missing something in the tooling or can you only do this by coding the API call from scratch?

Related

intents.json vs training w/ text file in GPT2 for training a chatbot

Forgive me 'cause I'm fairly new to AI and NN stuff. I'm trying to create a chatbot to have conversations with my friends in our discord channel.
I know that an intents.json file can help the chatbot know the intent of the user's message, then reply with an appropriate response, but it seems very static (i.e. you have a tag like 'greetings' then you have maybe 5-6 written responses ready to go such as "Hi", "hey", "hello there", etc). I'm wanting my chatbot to be able to have dynamic responses (based on what the conversation is so far).
I've fiddled with Max Woolf's Google Colaboratory for GPT2 (https://colab.research.google.com/github/sarthakmalik/GPT2.Training.Google.Colaboratory/blob/master/Train_a_GPT_2_Text_Generating_Model_w_GPU.ipynb) and have used simple text files to train a model.
I'm confused on the difference between using an intents.json file to train an AI model vs. using a regular text file in GPT2 to train the model. Where would you use one vs the other or are they accomplishing the same thing? I hope this makes sense. Any help or clarification would be appreciated as well as resources to read up on stuff!

How to pass text from alexa/echo to a server

I want to create a simple skill which uses alexa's voice to text translation and then passes the text to a different service.
This seems close to what i am looking for: https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interface-reference,
but nowhere in the example request/response does it have the text form of the users request to alexa. Has someone managed to extract the text data of user voice commands, would be very helpful for me.
It is not easy, unfortunately and by design.
User responses are stored in word 'slots' in the interaction model. This is how Alexa parses speech and execute commands based on user utterances. Without knowing every variation of a possible word slot and writing this into your interaction model, you won't be able to store free form text in a variable and 'pass it on'.
It's not possible. The closest you can get there is by using the built-in slot type AMAZON.Literal (US only and deprecated) or AMAZON.SearchQuery (available in all locales). I say "closest" because SearchQuery for example requires a carrier phrase in the utterance besides the slot (it can't be there alone capturing everything).
Note that the free form capture provided by these types is less accurate than if you define a custom slot type (i.e. you know more or less what you want to capture)

Scraping data after filling out form?

I'm doing a little project for my class and I'm just a beginner, so please forgive me if I mix up some of my terminology.
Basically, I'm creating an interactive journey planner for my city's public transit system. Unfortunately, they haven't made all the data I need publicly available. So instead of putting all my time into gathering the data for personal use, I've opted to do some screen scraping - letting their servers calculate the journey info from a START and STOP variable and then displaying the selected info on my page.
So is it possible to fill out a form's fields remotely, and then scrape the data on the page that subsequently loads? And if so, what would be the quickest, most convenient way? This happens to be a case where the data can't be manipulated via the URL, so it has to access the data by filling out the form first.
The website in question:
http://jp.translink.com.au/travel-information/journey-planner
Here is what you can do:
1.) Send a POST Request to the journey-planner with some data like that (be aware that CORS might jump in, then you could use cURL via PHP or whatsoever):
Start:Wickham Tce, Spring Hill
End:Upper Edward St, Spring Hill
SearchDate:10/05/2013 12:00:00 AM
TimeSearchMode:LeaveAfter
SearchHour:7
SearchMinute:40
TimeMeridiem:AM
TransportModes:Bus
TransportModes:Train
TransportModes:Ferry
MaximumWalkingDistance:1500
WalkingSpeed:Normal
ServiceTypes:Regular
ServiceTypes:Express
ServiceTypes:NightLink
FareTypes:Standard
FareTypes:Prepaid
FareTypes:Free
2.) You will get a new response location. This seems to be a REST link. Important for you is the id at the end. You will have to call that page and parse the HTML and look for a div with the HTML-id option-summaries, where you will find more information within the divs travel-option-1 to travel-option-n. You have to look at it carefully in order to find out which information is stored whee and how you will be able to use it.
In order to find such things you should learn how to use Firebug or Chrome's development tools.
This is one way to solve your problem. Probably not the best but still better than "screen-scraping" anything. But it will ask you for a lot of skills and effort. Furthermore if the data provider is going to change just a bit your solution will not work anymore. Additionally they might prevent your access by CORS or anything else (blocking your IP etc.)

Tweet counter for identi.ca

Is there a way to retrieve the amount of times a certain URL was "dented" (shared on identi.ca, status.net and/or the likes?).
For twitter there are several services that give this information.
Twitter itself: http://urls.api.twitter.com/1/urls/count.json?url=http://example.com&callback=twttr.receiveCount
Tweetmeme: http://api.tweetmeme.com/url_info.jsonc?url=http://example.com
Topsy: http://otter.topsy.com/stats.js?url=http://example.com&callback=?
I don't need the fancy extra information that Tweetmeme or Topsy deliver, only the amount.
I am aware that this is problematic, seen from the "distributed" nature of status.net: it will only give a count from once single silo, e.g. identi.ca. However, for me, for now, that would be enough.
Is there such an endpoint that gives me such JSON?
I don't think so. There's a file table in StatusNet databases that holds references to dented URLs (so it wouldn't be hard to count them if you had access to database or could write a plugin -- i.e., you wouldn't have to parse all notices, just lookup the file table), but it's not exposed through the API.
The list of API possible calls for StatusNet is here: http://status.net/wiki/TwitterCompatibleAPI
In addition, there's a proposed Google Summer of Code project on this subject: Social Analytics plugin

Crawling and Scraping iTunes App Store

I noticed that iTunes preview allows you to crawl and scrape pages via the http:// protocol. However, many of the links are trying to be opened in iTunes rather than the browser. For example, when you go to the iBooks page, it immediately tries opening a url with an itms:// protocol.
Are there any other methods of crawling the App Store or is this the only way?
Can the itms:// protocol links themselves be crawled somehow?
I would have a decent look at the iTunes Search API and the iTunes Enterprise Partner API
Search API -
http://www.apple.com/itunes/affiliates/resources/blog/introduction---search-api.html
Enterprise Partner API -
http://www.apple.com/itunes/affiliates/resources/documentation/itunes-enterprise-partner-feed.html
You might get most/all of the information you need in a nice JSON file format.
If you can't get the information you need with the API, I would be interested what it is :)
As phillipp mentioned, the iTunes search API is an easy way to retrieve data about your App Store listings in JSON format.
Simply query for this with your app id (you can find the app id by viewing the web listing for your app at itunes.apple.com), ex:
http://itunes.apple.com/lookup?id=INSERT_YOUR_APP_ID_HERE
then, parse the resulting JSON to your heart's content.
The only difference between http:// links and itms:// links is that you need to set your User-Agent to an iTunes user-agent, and depending on the version you may also have to include a verification code based on some not-so-secret algorithm.
For example this is the code for iTunes 9:
# Some magic. Generates a seed we use for X-Apple-Validation. Adapted from LWP::UserAgent::iTMS_Client.
function comp_seed($url, $user_agent) {
$random = sprintf( "%04X%04X", rand(0,0x10000), rand(0,0x10000) );
$static = base64_decode("ROkjAaKid4EUF5kGtTNn3Q==");
$url_end = ( preg_match("|.*/.*/.*(/.+)$|",$url,$matches)) ? $matches[1] : '?';
$digest = md5(join("",array($url_end, $user_agent, $static, $random)) );
return $random . '-' . strtoupper($digest);
}
However if you are only scraping, iTunes preview should work for your purposes, the link you gave us to the iBooks page had more than enough information to scrape.
We tried scraping ourselves too about a year ago and it just became too much of a headache. Philipp's comment is a good one as the enterprise feed from apple (need to apply for it with a legitimate use) does have a good amount of useful info that you might be after in scraping.
There are a few companies that offer data as a service too - abto and AppMonsta are two I heard of when I was looking. I can't seem to find abto anymore but http://appmonsta.com seems to be. The search API looks ok (never experimented) but limited.
Good luck!