sql query for searching within columns with json documents - mysql

I am using MySQL 5.7 and one of the columns in my table contains multiple JSON documents. Some thing like:
'[ {
"animal" : "dog",
"data" : {
"body" : "This sentence does not contain any thing about grooming",
}
},
{
"animal" : "cat",
"data" : {
"body" : "No grooming needed"
}
},
{
"animal" : "horse",
"data" : {
"body" : "He is grooming his horse after the ride."
}
}
]'
I want to return all rows where $.data.body contains grooming more than once, but only if $.animal == horse. So in the example given above it should not return the row since grooming is used only once in the section $.data.body where $.animal == horse.
Is there a good way to query this in MySql/SQL? I can do it in python but interested in knowing if there's a way to do this in SQL/MySQL. Thanks!

Searching JSON requires complex queries, and it is hard to optimize:
SELECT ...
FROM mytable
CROSS JOIN JSON_TABLE(myjsoncolumn, '$[*]' COLUMNS(
animal varchar(20) PATH '$.animal',
body text PATH '$.data.body'
)) AS j
WHERE j.animal = 'horse' AND j.body LIKE '%grooming%';
The JSON_TABLE() function is available in MySQL 8.0.4, but not earlier versions of MySQL.
The bottom line is that if you are trying to search the content of JSON documents, your use of SQL is going to be a lot more difficult and less efficient.
This would be far easier if you did not store the data in JSON, but instead stored data in normal rows and columns. From the example you show, there's no reason it needs to be JSON.

Related

How to merge a dynamically named record with a static one in Dhall?

I'm creating an AWS Step Function definition in Dhall. However, I don't know how to create a common structure they use for Choice states such as the example below:
{
"Not": {
"Variable": "$.type",
"StringEquals": "Private"
},
"Next": "Public"
}
The Not is pretty straightforward using mapKey and mapValue. If I define a basic Comparison:
{ Type =
{ Variable : Text
, StringEquals : Optional Text
}
, default =
{ Variable = "foo"
, StringEquals = None Text
}
}
And the types:
let ComparisonType = < And | Or | Not >
And adding a helper function to render the type as Text for the mapKey:
let renderComparisonType = \(comparisonType : ComparisonType )
-> merge
{ And = "And"
, Or = "Or"
, Not = "Not"
}
comparisonType
Then I can use them in a function to generate the record halfway:
let renderRuleComparisons =
\( comparisonType : ComparisonType ) ->
\( comparisons : List ComparisonOperator.Type ) ->
let keyName = renderComparisonType comparisonType
let compare = [ { mapKey = keyName, mapValue = comparisons } ]
in compare
If I run that using:
let rando = ComparisonOperator::{ Variable = "$.name", StringEquals = Some "Cow" }
let comparisons = renderRuleComparisons ComparisonType.Not [ rando ]
in comparisons
Using dhall-to-json, she'll output the first part:
{
"Not": {
"Variable": "$.name",
"StringEquals": "Cow"
}
}
... but I've been struggling to merge that with "Next": "Sup". I've used all the record merges like /\, //, etc. and it keeps giving me various type errors I don't truly understand yet.
First, I'll include an approach that does not type-check as a starting point to motivate the solution:
let rando = ComparisonOperator::{ Variable = "$.name", StringEquals = Some "Cow" }
let comparisons = renderRuleComparisons ComparisonType.Not [ rando ]
in comparisons # toMap { Next = "Public" }
toMap is a keyword that converts records to key-value lists, and # is the list concatenation operator. The Dhall CheatSheet has a few examples of how to use both of them.
The above solution doesn't work because # cannot merge lists with different element types. The left-hand side of the # operator has this type:
comparisons : List { mapKey : Text, mapValue : Comparison.Type }
... whereas the right-hand side of the # operator has this type:
toMap { Next = "Public" } : List { mapKey : Text, mapValue : Text }
... so the two Lists cannot be merged as-is due to the different types for the mapValue field.
There are two ways to resolve this:
Approach 1: Use a union whenever there is a type conflict
Approach 2: Use a weakly-typed JSON representation that can hold arbitrary values
Approach 1 is the simpler solution for this particular example and Approach 2 is the more general solution that can handle really weird JSON schemas.
For Approach 1, dhall-to-json will automatically strip non-empty union constructors (leaving behind the value they were wrapping) when translating to JSON. This means that you can transform both arguments of the # operator to agree on this common type:
List { mapKey : Text, mapValue : < State : Text | Comparison : Comparison.Type > }
... and then you should be able to concatenate the two lists of key-value pairs and dhall-to-json will render them correctly.
There is a second solution for dealing with weakly-typed JSON schemas that you can learn more about here:
Dhall Manual - How to convert an existing YAML configuration file to Dhall
The basic idea is that all of the JSON/YAML integrations recognize and support a weakly-typed JSON representation that can hold arbitrary JSON data, including dictionaries with keys of different shapes (like in your example). You don't even need to convert the entire the expression to this weakly-typed representation; you only need to use this representation for the subset of your configuration where you run into schema issues.
What this means for your example, is that you would change both arguments to the # operator to have this type:
let Prelude = https://prelude.dhall-lang.org/v12.0.0/package.dhall
in List { mapKey : Text, mapValue : Prelude.JSON.Type }
The documentation for Prelude.JSON.Type also has more details on how to use this type.

Phrase & wildcard queries on Elasticsearch

I am facing some difficulties while trying to create a query that can match only whole phrases, but allows wildcards as well.
Basically I have a filed that contains a string (it is actually a list of strings, but for simplicity I am skipping that), which can contain white spaces or be null, lets call it "color".
For example:
{
...
"color": "Dull carmine pink"
...
}
My queries need to be able to do the following:
search for null values (inclusive and exclusive)
search for non null values (inclusive and exclusive)
search for and match only a whole phrase (inclusive and exclusive). For example:
dull carmine pink --> match
carmine pink --> not a match
same as the last, but with wildcards (inclusive and exclusive). For example:
?ull carmine p* --> match to "Dull carmine pink"
dull carmine* -> match to "Dull carmine pink"
etc.
I have been bumping my head against the wall for a few days with this and I have tried almost every type of query I could think of.
I have only managed to make it work partially with a span_near query with the help of this topic.
So basically I can now:
search for a whole phrase with/without wildcards like this:
{
"span_near": {
"clauses": [
{
"span_term": {"color": "dull"}
},
{
"span_term": {"color": "carmine"}
},
{
"span_multi": {"match": {"wildcard": {"color": "p*"}}}
}
],
"slop": 0,
"in_order": true
}
}
search for null values (inclusive and exclusive) by simple must/must_not queries like this:
{
"must" / "must_not": {'exist': {'field': 'color'}}
}
The problem:
I cannot find a way to make an exclusive span query. The only way I can find is this. But it requires both include & exclude fields, and I am only trying to exclude some fields, all others must be returned. Is there some analog of the "match_all":{} query that can work inside of an span_not's include field? Or perhaps an entire new, more elegant solution?
I found the solution a month ago, but I forgot to post it here.
I do not have an example at hand, but I will try to explain it.
The problem was that the fields I was trying to query were analyzed by elasticsearch before querying. The analyzer in question was dividing them by spaces etc. The solution to this problem is one of the two:
1. If you do not use a custom mapping for the index.
(Meaning if you let elasticsearch to dynamically create the appropriate mapping for your field when you were adding it).
In this case elastic search automatically creates a subfield of the text field called "keyword". This subfield uses the "keyword" analyzer which does not process the data in any way prior to querying.
Which means that queries like:
{
"query": {
"bool": {
"must": [ // must_not
{
"match": {
"user.keyword": "Kim Chy"
}
}
]
}
}
}
and
{
"query": {
"bool": {
"must": [ // must_not
{
"wildcard": {
"user.keyword": "Kim*y"
}
}
]
}
}
}
should work as expected.
However with the default mapping, the keyword field will most likely be case-sensitive. In order for it to be case-insensitive as well, you will need to create a custom mapping, that applies a lower-case (or upper-case) normalizer to the query and keyword field prior to matching.
2. If you use a custom mapping
Basically the same as above, however you will have to create a new subfield (or field) manually that uses the keyword analyzer (and possibly a normalizer in order for it to be case-insensitive).
P.S. As far as I am aware changing of a mapping is no longer possible in elasticsearch. This means that you will have to create a new index with the appropriate mapping, and then reindex your data to the new index.

How do I search for a specific string in a JSON Postgres data type column?

I have a column named params in a table named reports which contains JSON.
I need to find which rows contain the text 'authVar' anywhere in the JSON array. I don't know the path or level in which the text could appear.
I want to just search through the JSON with a standard like operator.
Something like:
SELECT * FROM reports
WHERE params LIKE '%authVar%'
I have searched and googled and read the Postgres docs. I don't understand the JSON data type very well, and figure I am missing something easy.
The JSON looks something like this.
[
{
"tileId":18811,
"Params":{
"data":[
{
"name":"Week Ending",
"color":"#27B5E1",
"report":"report1",
"locations":{
"c1":0,
"c2":0,
"r1":"authVar",
"r2":66
}
}
]
}
}
]
In Postgres 11 or earlier it is possible to recursively walk through an unknown json structure, but it would be rather complex and costly. I would propose the brute force method which should work well:
select *
from reports
where params::text like '%authVar%';
-- or
-- where params::text like '%"authVar"%';
-- if you are looking for the exact value
The query is very fast but may return unexpected extra rows in cases when the searched string is a part of one of the keys.
In Postgres 12+ the recursive searching in JSONB is pretty comfortable with the new feature of jsonpath.
Find a string value containing authVar:
select *
from reports
where jsonb_path_exists(params, '$.** ? (#.type() == "string" && # like_regex "authVar")')
The jsonpath:
$.** find any value at any level (recursive processing)
? where
#.type() == "string" value is string
&& and
# like_regex "authVar" value contains 'authVar'
Or find the exact value:
select *
from reports
where jsonb_path_exists(params, '$.** ? (# == "authVar")')
Read in the documentation:
The SQL/JSON Path Language
jsonpath Type

Is it possible to query JSON data in DynamoDB?

Let's say my JSON looks like this (example provided here) -
{
"year" : 2013,
"title" : "Turn It Down, Or Else!",
"info" : {
"directors" : [
"Alice Smith",
"Bob Jones"
],
"release_date" : "2013-01-18T00:00:00Z",
"rating" : 6.2,
"genres" : [
"Comedy",
"Drama"
],
"image_url" : "http://ia.media-imdb.com/images/N/O9ERWAU7FS797AJ7LU8HN09AMUP908RLlo5JF90EWR7LJKQ7##._V1_SX400_.jpg",
"plot" : "A rock band plays their music at high volumes, annoying the neighbors.",
"rank" : 11,
"running_time_secs" : 5215,
"actors" : [
"David Matthewman",
"Ann Thomas",
"Jonathan G. Neff"
]
}
}
I would like to query all movies where genres contains Drama.
I went through all of the examples but it seems that I can query only on hash key and sort key. I can't have JSON document as key itself as that is not supported.
You cannot. DynamoDB requires that all attributes you are filtering for have an index.
As you want to query independently of your main index, you are limited to Global Secondary Indexes.
The documentation lists on what kind of attributes indexes are supported:
The index key attributes can consist of any top-level String, Number, or Binary attributes from the base table; other scalar types, document types, and set types are not allowed.
Your type would be an array of Strings. So this query operation isn't supported by DynamoDB at this time.
You might want to consider other NoSQL document based databases which are more flexible like MongoDB Atlas, if you need this kind of querying functionality.
String filterExpression = "coloumnname.info.genres= :param";
Map valueMap = new HashMap();
valueMap.put(":param", "Drama");
ItemCollection scanResult = table
.scan(new ScanSpec().
withFilterExpression(filterExpression).
withValueMap(valueMap));
One example that I took from AWS Developer Forums is as follows.
We got some hints for you from our team. Filter/condition expressions for maps have to have key names at each level of the map specified separately in the expression attributeNames map.
Your expression should look like this:
{
"TableName": "genericPodcast",
"FilterExpression": "#keyone.#keytwo.#keythree = :keyone",
"ExpressionAttributeNames": {
"#keyone": "attributes",
"#keytwo": "playbackInfo",
"#keythree": "episodeGuid"
},
"ExpressionAttributeValues": {
":keyone": {
"S": "podlove-2018-05-02t19:06:11+00:00-964957ce3b62a02"
}
}
}

How do I query a complex JSONB field in Django 1.9

I have a table item with a field called data of type JSONB. I would like to query all items that have text that equals 'Super'. I am trying to do this currently by doing this:
Item.objects.filter(Q(data__areas__texts__text='Super'))
Django debug toolbar is reporting the query used for this is:
WHERE "item"."data" #> ARRAY['areas', 'texts', 'text'] = '"Super"'
But I'm not getting back any matching results. How can I query this using Django? If it's not possible in Django, then how can I query this in Postgresql?
Here's an example of the contents of the data field:
{
"areas": [
{
"texts": [
{
"text": "Super"
}
]
},
{
"texts": [
{
"text": "Duper"
}
]
}
]
}
try Item.objects.filter(data__areas__0__texts__0__text='Super')
it is not exact answer, but it can clarify some jsonb filter features, also read django docs
I am not sure what you want to achieve with this structure, but I was able to get the desired result only with strange raw query, it can look like this:
Item.objects.raw("SELECT id, data FROM (SELECT id, data, jsonb_array_elements(\"table_name\".\"data\" #> '{areas}') as areas_data from \"table_name\") foo WHERE areas_data #> '{texts}' #> '[{\"text\": \"Super\"}]'")
Dont forget to change table_name in query (in your case it should be yourappname_item).
Not sure you can use this query in real programs, but it probably can help you to find a way for a better solution.
Also, there is very good intro to jsonb query syntax
Hope it will help you