Chatbot with NLTK with type of request and parameters - nltk

I am new to NLTK and trying to build a chatbot with type of request and parameter.
For ex,
corpus = [
{
name: "appt-count",
text: "How many appointments I have for today?"
},
{
name: "appt-count",
text: "What is my total appointments for today?"
},
{
name: "appt-list",
text: "What are all my appointments today?"
},
{
name: "appt-list",
text: "Call you tell me my appointments today?"
},
{
name: "appt-view",
text: "What is my next appointment?"
},
{
name: "appt-view",
text: "What is my appointment after lunch?"
},
{
name: "appt-view",
text: "What is my first appointment today?"
},
{
name: "appt-view",
text: "What is my last appointment today?"
}
]
Here, When user input text, it is supposed to return the corresponding name, so that the system will invoke the corresponding API and return the result to the user.
When user say "How many appointments I have for today?" it is supposed to return "appt-count" and also the parameter as "today" so that they system will check for today and user could enter some date also here.
I am trying to use LogisticRegression from sklearn.linear_model. Getting error in np.array code. Am I on the right track?
import nltk
import pickle
import pandas as pd
import numpy as np
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.classify import ClassifierI
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer
training = []
ps = PorterStemmer()
df=pd.read_json("./train_data.json", orient="records")
for name, text in zip(df['name'], df['text']):
words = word_tokenize(text)
stemWord = [ps.stem(w.lower()) for w in words]
training.append([stemWord , name])
// Error:- Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
training = np.array(training)
train_x = list(training[:,0])
train_y = list(training[:,1])
print("Training data created")
model = LogisticRegression()
model=model.fit(train_x,train_y)

Related

kotlinx.serialization.MissingFieldException: Field 'X' is required for type with serial name but it was missing Error in kotlin

I am making a dictionary application and I have my own json dataset inside the asset file. I read from this data set for the first time and save it to the room, and then I read from the room for other times. However, I am encountering this error.
E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.enestigli.dictionaryapp, PID: 1868
kotlinx.serialization.MissingFieldException: Field 'example' is required for type with serial name 'com.enestigli.dictionaryapp.data.locale.datamodel.Idioms', but it was missing
at kotlinx.serialization.internal.PluginExceptionsKt.throwMissingFieldException(PluginExceptions.kt:20)
at com.enestigli.dictionaryapp.data.locale.datamodel.Idioms.<init>(Idiom.kt:28)
at com.enestigli.dictionaryapp.data.locale.datamodel.Idioms$$serializer.deserialize(Idiom.kt:28)
at com.enestigli.dictionaryapp.data.locale.datamodel.Idioms$$serializer.deserialize(Idiom.kt:28)
at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:59)
at kotlinx.serialization.json.internal.StreamingJsonDecoder.decodeSerializableValue(StreamingJsonDecoder.kt:36)
at kotlinx.serialization.encoding.AbstractDecoder.decodeSerializableValue(AbstractDecoder.kt:43)
at kotlinx.serialization.encoding.AbstractDecoder.decodeSerializableElement(AbstractDecoder.kt:70)
at kotlinx.serialization.encoding.CompositeDecoder$DefaultImpls.decodeSerializableElement$default(Decoding.kt:535)
at kotlinx.serialization.internal.ListLikeSerializer.readElement(CollectionSerializers.kt:80)
at kotlinx.serialization.internal.AbstractCollectionSerializer.readElement$default(CollectionSerializers.kt:51)
at kotlinx.serialization.internal.AbstractCollectionSerializer.merge(CollectionSerializers.kt:36)
at kotlinx.serialization.internal.AbstractCollectionSerializer.deserialize(CollectionSerializers.kt:43)
at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:59)
at kotlinx.serialization.json.internal.StreamingJsonDecoder.decodeSerializableValue(StreamingJsonDecoder.kt:36)
at kotlinx.serialization.encoding.AbstractDecoder.decodeSerializableValue(AbstractDecoder.kt:43)
at kotlinx.serialization.encoding.AbstractDecoder.decodeSerializableElement(AbstractDecoder.kt:70)
at com.enestigli.dictionaryapp.data.locale.datamodel.IdiomItem$$serializer.deserialize(Idiom.kt:15)
at com.enestigli.dictionaryapp.data.locale.datamodel.IdiomItem$$serializer.deserialize(Idiom.kt:15)
at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:59)
at kotlinx.serialization.json.internal.StreamingJsonDecoder.decodeSerializableValue(StreamingJsonDecoder.kt:36)
at kotlinx.serialization.encoding.AbstractDecoder.decodeSerializableValue(AbstractDecoder.kt:43)
at kotlinx.serialization.encoding.AbstractDecoder.decodeSerializableElement(AbstractDecoder.kt:70)
at kotlinx.serialization.encoding.CompositeDecoder$DefaultImpls.decodeSerializableElement$default(Decoding.kt:535)
at kotlinx.serialization.internal.ListLikeSerializer.readElement(CollectionSerializers.kt:80)
at kotlinx.serialization.internal.AbstractCollectionSerializer.readElement$default(CollectionSerializers.kt:51)
at kotlinx.serialization.internal.AbstractCollectionSerializer.merge(CollectionSerializers.kt:36)
at kotlinx.serialization.internal.AbstractCollectionSerializer.deserialize(CollectionSerializers.kt:43)
at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:59)
at kotlinx.serialization.json.internal.StreamingJsonDecoder.decodeSerializableValue(StreamingJsonDecoder.kt:36)
at com.enestigli.dictionaryapp.data.locale.datamodel.IdiomDataModel$$serializer.deserialize-zhuhEdE(Idiom.kt:8)
at com.enestigli.dictionaryapp.data.locale.datamodel.IdiomDataModel$$serializer.deserialize(Idiom.kt:8)
at kotlinx.serialization.json.internal.PolymorphicKt.decodeSerializableValuePolymorphic(Polymorphic.kt:59)
at kotlinx.serialization.json.internal.StreamingJsonDecoder.decodeSerializableValue(StreamingJsonDecoder.kt:36)
at kotlinx.serialization.json.Json.decodeFromString(Json.kt:100)
at com.enestigli.dictionaryapp.data.locale.datasource.AssetDataSource.getIdioms-zhuhEdE(AssetsDataSource.kt:84)
E/AndroidRuntime: at com.enestigli.dictionaryapp.data.repository.IdiomsRepositoryImpl.initData(IdiomsRepositoryImpl.kt:21)
at com.enestigli.dictionaryapp.domain.use_case.idiom.insert.InsertIdiomsUseCase.initIdiomData(InsertIdiomsUseCase.kt:12)
at com.enestigli.dictionaryapp.presentation.SplashScreen.SplashScreenViewModel$insertAllDataToRoomDb$1.invokeSuspend(SplashScreenViewModel.kt:38)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:201)
at android.os.Looper.loop(Looper.java:288)
at android.app.ActivityThread.main(ActivityThread.java:7898)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:936)
Suppressed: kotlinx.coroutines.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}#5aeb15c, Dispatchers.Main.immediate]
my json dataset looks like this
[
{
"letter": "A",
"idioms": [
{
"idiom": "above board",
"meaning": "If something is above board, it's been done in a legal and honest way.",
"examples": [
"I'm sure the deal was completely above board as I know James well and he'd never do anything illegal or corrupt.",
"The minister claimed all the appointments were above board and denied claims that some positions had been given to his friends."
]
},
{
"idiom": "above the law",
"meaning": "If someone is above the law, they are not subject to the laws of a society.",
"examples": [
"Just because his father is a rich and powerful man, he seems to think he's above the law and he can do whatever he likes.",
"In a democracy, no-one is above the law - not even a president or a prime-minister."
]
},
{
"idiom": "Achilles' heel",
"meaning": "An Achilles' heel is a weakness that could result in failure.",
"examples": [
"He's a good golfer, but his Achilles' heel is his putting and it's often made him lose matches.",
"The country's dependence on imported oil could prove to be its Achilles' heel if prices keep on rising."
]
},
My Data Model
#JvmInline
#Serializable
value class IdiomDataModel(
val allData: List<IdiomItem>
)
#Serializable
data class IdiomItem(
val letter: String,
val idioms: List<Idioms>
) {
fun toEntity() = IdiomEntity(
letter = letter,
idioms = idioms
)
}
#Serializable
data class Idioms(
val idiom: String,
val meaning: String,
val example: List<String>
)
Entity
#Entity(tableName = "idioms")
data class IdiomEntity(
#ColumnInfo(name = "letter") val letter:String,
#ColumnInfo(name = "idioms") val idioms:List<Idioms>,
#PrimaryKey(autoGenerate = true) val uid:Int? = null
)
I think the val example: List<String> in the Idioms data class is from this part. Could it be that this does not exactly match the example list in the Json dataset?
In your Idioms data class, rename example to examples and it should work.
Old:
#Serializable
data class Idioms(
val idiom: String,
val meaning: String,
val example: List<String>
)
New
#Serializable
data class Idioms(
val idiom: String,
val meaning: String,
val examples: List<String>
)

Pulling specific Parent/Child JSON data with Python

I'm having a difficult time figuring out how to pull specific information from a json file.
So far I have this:
# Import json library
import json
# Open json database file
with open('jsondatabase.json', 'r') as f:
data = json.load(f)
# assign variables from json data and convert to usable information
identifier = data['ID']
identifier = str(identifier)
name = data['name']
name = str(name)
# Collect data from user to compare with data in json file
print("Please enter your numerical identifier and name: ")
user_id = input("Numerical identifier: ")
user_name = input("Name: ")
if user_id == identifier and user_name == name:
print("Your inputs matched. Congrats.")
else:
print("Your inputs did not match our data. Please try again.")
And that works great for a simple JSON file like this:
{
"ID": "123",
"name": "Bobby"
}
But ideally I need to create a more complex JSON file and can't find deeper information on how to pull specific information from something like this:
{
"Parent": [
{
"Parent_1": [
{
"Name": "Bobby",
"ID": "123"
}
],
"Parent_2": [
{
"Name": "Linda",
"ID": "321"
}
]
}
]
}
Here is an example that you might be able to pick apart.
You could either:
Make a custom de-jsonify object_hook as shown below and do something with it. There is a good tutorial here.
Just gobble up the whole dictionary that you get without a custom de-jsonify and drill down into it and make a list or set of the results. (not shown)
Example:
import json
from collections import namedtuple
data = '''
{
"Parents":
[
{
"Name": "Bobby",
"ID": "123"
},
{
"Name": "Linda",
"ID": "321"
}
]
}
'''
Parent = namedtuple('Parent', ['name', 'id'])
def dejsonify(json_str: dict):
if json_str.get("Name"):
parent = Parent(json_str.get('Name'), int(json_str.get('ID')))
return parent
return json_str
res = json.loads(data, object_hook=dejsonify)
print(res)
# then we can do whatever... if you need lookups by name/id,
# we could put the result into a dictionary
all_parents = {(p.name, p.id) : p for p in res['Parents']}
lookup_from_input = ('Bobby', 123)
print(f'found match: {all_parents.get(lookup_from_input)}')
Result:
{'Parents': [Parent(name='Bobby', id=123), Parent(name='Linda', id=321)]}
found match: Parent(name='Bobby', id=123)

Python Regex: How to match the string and then modify that string by adding something at the end

UPDATED CODE: It is working but now the problem is that the code is attaching same random_value to every Path.
Following is my code with a sample chunk of text. I want to read Path and it's value then add (/some unique random alphabet and number combination) at the end of every Path value without changing the already existed value. For example I want the Path to be like
"Path" : "already existed value/1A" e.t.c something like that.
I am unable to make the exact regex pattern of replacing it.
Any help would be appreciated.
It can be done by json parse but the requirement of the task is to do it via REGEX.
from io import StringIO
import re
import string
import random
reader = StringIO("""{
"Bounds": [
{
"HasClip": true,
"Lang": "no",
"Page": 0,
"Path": "//Document/Sect[2]/Aside/P",
"Text": "Potsdam, den 9. Juni 2021 ",
"TextSize": 12.0
}
],
},
{
"Bounds": [
{
"HasClip": true,
"Lang": "de",
"Page": 0,
"Path": "//Document/Sect[3]/P[4]",
"Text": "this is some text ",
"TextSize": 9.0,
}
],
}""")
def id_generator(size=3, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for _ in range(size))
text = reader.read()
random_value = id_generator()
pattern = r'"Path": "(.*?)"'
replacement = '"Path": "\\1/'+random_value+'"'
text = re.sub(pattern, replacement, text)
#This is working but it is only attaching one same random_value on every Path
print(text)
Use group 1 in the replacement:
replacement = '"Path": "\\1/1A"'
See live demo.
The replacement regex \1 puts back what was captured in group 1 of the match via (.*?).
Since you already have a json structure, maybe it would help to use the json module to parse it.
import json
myDict = json.loads("your json string / variable here")
# now myDict is a dictionary that you can use to loop/read/edit/modify and you can then export myDict as json.

Emit Python embedded object as native JSON in YAML document

I'm importing webservice tests from Excel and serialising them as YAML.
But taking advantage of YAML being a superset of JSON I'd like the request part of the test to be valid JSON, i.e. to have delimeters, quotes and commas.
This will allow us to cut and paste requests between the automated test suite and manual test tools (e.g. Postman.)
So here's how I'd like a test to look (simplified):
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request:
{
"unitTypeCode": "",
"unitNumber": "15",
"levelTypeCode": "L",
"roadNumber1": "810",
"roadName": "HAY",
"roadTypeCode": "ST",
"localityName": "PERTH",
"postcode": "6000",
"stateTerritoryCode": "WA"
}
In Python, my request object has a dict attribute called fields which is the part of the object to be serialised as JSON. This is what I tried:
import yaml
def request_presenter(dumper, request):
json_string = json.dumps(request.fields, indent=8)
return dumper.represent_str(json_string)
yaml.add_representer(Request, request_presenter)
test = Test(...including embedded request object)
serialised_test = yaml.dump(test)
I'm getting:
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request: "{
\"unitTypeCode\": \"\",\n
\"unitNumber\": \"15\",\n
\"levelTypeCode": \"L\",\n
\"roadNumber1\": \"810\",\n
\"roadName\": \"HAY\",\n
\"roadTypeCode\": \"ST\",\n
\"localityName\": \"PERTH\",\n
\"postcode\": \"6000\",\n
\"stateTerritoryCode\": \"WA\"\n
}"
...only worse because it's all on one line and has white space all over the place.
I tried using the | style for literal multi-line strings which helps with the line breaks and escaped quotes (it's more involved but this answer was helpful.) However, escaped or multiline, the result is still a string that will need to be parsed separately.
How can I stop PyYaml analysing the JSON block as a string and make it just accept a block of text as part of the emitted YAML? I'm guessing it's something to do with overriding the emitter but I could use some help. If possible I'd like to avoid post-processing the serialised test to achieve this.
Ok, so this was the solution I came up with. Generate the YAML with a placemarker ahead of time. The placemarker marks the place where the JSON should be inserted, and also defines the root-level indentation of the JSON block.
import os
import itertools
import json
def insert_json_in_yaml(pre_insert_yaml, key, obj_to_serialise):
marker = '%s: null' % key
marker_line = line_of_first_occurrence(pre_insert_yaml, marker)
marker_indent = string_indent(marker_line)
serialised = json.dumps(obj_to_serialise, indent=marker_indent + 4)
key_with_json = '%s: %s' % (key, serialised)
serialised_with_json = pre_insert_yaml.replace(marker, key_with_json)
return serialised_with_json
def line_of_first_occurrence(basestring, substring):
"""
return line number of first occurrence of substring
"""
lineno = lineno_of_first_occurrence(basestring, substring)
return basestring.split(os.linesep)[lineno]
def string_indent(s):
"""
return indentation of a string (no of spaces before a nonspace)
"""
spaces = ''.join(itertools.takewhile(lambda c: c == ' ', s))
return len(spaces)
def lineno_of_first_occurrence(basestring, substring):
"""
return line number of first occurrence of substring
"""
return basestring[:basestring.index(substring)].count(os.linesep)
embedded_object = {
"unitTypeCode": "",
"unitNumber": "15",
"levelTypeCode": "L",
"roadNumber1": "810",
"roadName": "HAY",
"roadTypeCode": "ST",
"localityName": "PERTH",
"postcode": "6000",
"stateTerritoryCode": "WA"
}
yaml_string = """
---
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request: null
after_request: another value
"""
>>> print(insert_json_in_yaml(yaml_string, 'request', embedded_object))
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request: {
"unitTypeCode": "",
"unitNumber": "15",
"levelTypeCode": "L",
"roadNumber1": "810",
"roadName": "HAY",
"roadTypeCode": "ST",
"localityName": "PERTH",
"postcode": "6000",
"stateTerritoryCode": "WA"
}
after_request: another value

Incorrect properties displayed from Json

I am having an issue when trying to collect data from a json using SOAP UI and groovy scripting. Below is an example json:
{
"regions": [{
"hotels": [{
"roomInformation": [{
"hotelRoomId": xxx,
}],
"regionId": x,
"hotelId": xxx,
"providerInformation": {
"ratePlanCode": "xxx",
},
"providerHotelId": 0000001
},
{
"roomInformation": [{
"hotelRoomId": xx,
}],
"regionId": x,
"hotelId": xxx,
"providerInformation": {
"ratePlanCode": "ggg",
},
"providerHotelId": 0000002
}
],
"errors": null
}],
"errors": null
}
What I want to do is select the first instance of providerHotelId and ratePlanCode. To do this I have the groovy script below to tackle this:
def alert = com.eviware.soapui.support.UISupport
import groovy.json.JsonSlurper
def response = testRunner.testCase.getTestStepByName("Search Test").getProperty("Response").getValue();
def jsonRes = new JsonSlurper().parseText(response);
def providerhotelid = jsonRes.regions.hotels.providerHotelId[0].toString()
def rateplancode = jsonRes.regions.hotels.providerInformation[0].ratePlanCode.toString()
log.info providerhotelid
testRunner.testCase.setPropertyValue('providerhotelid', providerhotelid)
testRunner.testCase.setPropertyValue('rateplancode', rateplancode)
This outputs below in my custom properties:
providerhotelid - [0000001,0000002]
rateplancode - [xxx]
The above is incorrect because:
providerhotelid - it displays all provider hotel ids when I only want the first one which should be 0000001.
rateplancode - is correct but it displays a [] around it and I want this removed. Same goes for providerhotelid.
So for this example my custom properties should display:
providerhotelid - 0000001
rateplancode - xxx
How can this be achieved within my groovy script?
Here is what you need:
//Get all the values, falatten them and get the first one
def providerhotelid = jsonRes.regions.hotels.providerHotelId.flatten()[0]
def rateplancode = jsonRes.regions.hotels.providerInformation.ratePlanCode.flatten()[0]
log.info providerhotelid
log.info rateplancode
You can quickly try it online Demo