icontains does not match any items in the database - json

This is my data:
>>> print(MyModel.objects.get(id=1).Fruits) #Fruits is JSONField
>>> print(favorites)
{"Title": ["Fruits"], "Name": ["Banana", "Cherry", "Apple", "Peach"], "Other":["Banana"]}
I define a query as follows:
>>> query = reduce(operator.or_, (Q(Fruits__Name__icontains=x) for x in favorites))
#or: query = reduce(operator.or_, (Q(Fruits__icontains={'Name':x}) for x in favorites))
>>> print(query)
(OR: ('Fruits__Name__icontains', 'Apple'), ('Fruits__Name__icontains', 'Banana'))
I want it return that item banana or an apple are in Name.
When I run this query (postgresql):
MyModel.objects.filter(query)
it doesn't match any item in the database.

With MyModel.Fruits as your JSONField, depending on the database used, it might be stored/accessed as string. You could try just accessing icontains directly on the JSON string value:
Fruits__icontains=x
So full query would be:
query = reduce(operator.or_, (Q(Fruits__icontains=x) for x in favorites))

Related

Postgres json to view

I have a table like this (with an jsonb column):
https://dbfiddle.uk/lGuHdHEJ
If I load this json with python into a dataframe:
import pandas as pd
import json
data={
"id": [1, 2],
"myyear": [2016, 2017],
"value": [5, 9]
}
data=json.dumps(data)
df=pd.read_json(data)
print (df)
I get this result:
id myyear value
0 1 2016 5
1 2 2017 9
How can a get this result directly from the json column via sql in a postgres view?
Note: This assumes that your id, my_year, and value array are consistent and have the same length.
This answer uses PostgresSQL's json_array_elements_text function to explode array elements to the rows.
select jsonb_array_elements_text(payload -> 'id') as "id",
jsonb_array_elements_text(payload -> 'bv_year') as "myyear",
jsonb_array_elements_text(payload -> 'value') as "value"
from main
And this gives the below output,
id myyear value
1 2016 5
2 2017 9
Although this is not the best design to store the properties in jsonb object and could lead to data inconsistencies later. If it's in your control I would suggest storing the data where each property's mapping is clear. Some suggestions,
You can instead have separate columns for each property.
If you want to store it as jsonb only then consider [{id: "", "year": "", "value": ""}]

Using Pandas to Flatten a JSON with a nested array

Have the following JSON. I want to pullout task flatten it and put into own data frame and include the ID from the parent
[
{
"id": 123456,
"assignee":{"id":5757,"firstName":"Jim","lastName":"Johnson"},
"resolvedBy":{"id":5757,"firstName":"Jim","lastName":"Johnson"},
"task":[{
"assignee":{"id":5757,"firstName":"Jim","lastName":"Johnson"},
"resolvedBy":{"id":5757,"firstName":"Jim","lastName":"Johnson"},
"taskId":898989,
"status":"Closed"
},
{
"assignee":{"id":5857,"firstName":"Nacy","lastName":"Johnson"},
"resolvedBy":{"id":5857,"firstName":"George","lastName":"Johnson"},
"taskId":999999
}
],
"state":"Complete"
},
{
"id": 123477,
"assignee":{"id":8576,"firstName":"Jack","lastName":"Johnson"},
"resolvedBy":{"id":null,"firstName":null,"lastName":null},
"task":[],
"state":"Inprogress"
}
]
I would like to get a dataframe from tasks like so
id, assignee.id, assignee.firstName, assignee.lastName, resolvedBy.firstName, resolvedBy.lastName, taskId, status
I have flattened the entire dataframe using
df=pd.json_normalize(json.loads(df.to_json(orient='records')))
It left tasks in [{}] which I think is okay because I want to pull tasks out into its own dataframe and include the id from the parent.
I have id and tasks in a dataframe like so
tasksdf=storiesdf[['tasks','id']]
then i want to normalize it like
tasksdf=pd.json_normalize(json.loads(tasksdf.to_json(orient='records')))
but I know since it is in an array I need to do something different. However I have not been able to figure it out. I have been looking at other examples and reading what others have done. Any help would be appreciated.
The main problem is that your task record is empty in some cases so it won't appear in your dataframe if you create it with json_normalize.
Secondly, some columns are redundant between assignee, resolvedBy and the nested task. I would therefore create the assignee.id, resolved.id...etc columns first and merge them with the normalized task:
json_data = json.loads(json_str)
df = pd.DataFrame.from_dict(json_data)
df = df.explode('task')
df_assign = pd.DataFrame()
df_assign[["assignee.id", "assignee.firstName", "assignee.lastName"]] = pd.DataFrame(df['assignee'].values.tolist(), index=df.index)
df = df.join(df_assign).drop('assignee', axis=1)
df_resolv = pd.DataFrame()
df_resolv[["resolvedBy.id", "resolvedBy.firstName", "resolvedBy.lastName"]] = pd.DataFrame(df['resolvedBy'].values.tolist(), index=df.index)
df = df.join(df_resolv).drop('resolvedBy', axis=1)
df_task = pd.json_normalize(json_data, record_path='task', meta=['id', 'state'])
df = df.merge(df_task, on=['id', 'state', "assignee.id", "assignee.firstName", "assignee.lastName", "resolvedBy.id", "resolvedBy.firstName", "resolvedBy.lastName"], how="outer").drop('task', axis=1)
print(df.drop_duplicates().reset_index(drop=True))
Output:
id state assignee.id assignee.firstName ... resolvedBy.firstName resolvedBy.lastName taskId status
0 123456.0 Complete 5757 Jim ... Jim Johnson 898989.0 Closed
1 123477.0 Inprogress 8576 Jack ... None None NaN NaN
2 123456 Complete 5857 Nacy ... George Johnson 999999.0 NaN

Livy Server: return a dataframe as JSON?

I am executing a statement in Livy Server using HTTP POST call to localhost:8998/sessions/0/statements, with the following body
{
"code": "spark.sql(\"select * from test_table limit 10\")"
}
I would like an answer in the following format
(...)
"data": {
"application/json": "[
{"id": "123", "init_date": 1481649345, ...},
{"id": "133", "init_date": 1481649333, ...},
{"id": "155", "init_date": 1481642153, ...},
]"
}
(...)
but what I'm getting is
(...)
"data": {
"text/plain": "res0: org.apache.spark.sql.DataFrame = [id: string, init_date: timestamp ... 64 more fields]"
}
(...)
Which is the toString() version of the dataframe.
Is there some way to return a dataframe as JSON using the Livy Server?
EDIT
Found a JIRA issue that addresses the problem: https://issues.cloudera.org/browse/LIVY-72
By the comments one can say that Livy does not and will not support such feature?
I recommend using the built-in (albeit hard to find documentation for) magics %json and %table:
%json
session_url = host + "/sessions/1"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val d = spark.sql("SELECT COUNT(DISTINCT food_item) FROM food_item_tbl")
val e = d.collect
%json e
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
%table
session_url = host + "/sessions/21"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val x = List((1, "a", 0.12), (3, "b", 0.63))
%table x
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
Related: Apache Livy: query Spark SQL via REST: possible?
I don't have a lot of experience with Livy, but as far as I know this endpoint is used as an interactive shell and the output will be a string with the actual result that would be shown by a shell. So, with that in mind, I can think of a way to emulate the result you want, but It may not be the best way to do it:
{
"code": "println(spark.sql(\"select * from test_table limit 10\").toJSON.collect.mkString(\"[\", \",\", \"]\"))"
}
Then, you will have a JSON wrapped in a string, so your client could parse it.
I think in general your best bet is to write your output to a database of some kind. If you write to a randomly named table, you could have your code read it after the script is done.

iterating over named tuples in a list

Am trying to retrieve values from a namedtuple in a list. It takes one argument, a namedtuple or a list of namedtuples, and returns the value in the price field. Here is my code:
def price(rlist):
for item in rlist:
return item.price
RC = [
Restaurant("Thai Dishes", "Thai", "334-4433", "Mee Krob", 12.50),
Restaurant("Nobu", "Japanese", "335-4433", "Natto Temaki", 5.50)]
print(price(RC))
It should print 12.50 and 5.50..but it only prints 12.50. How can I improve or correct the iteration?
Book = namedtuple('Book', 'title author year price')
favorite = Book('Adventures of Sherlock Holmes', 'Arthur Conan Doyle', 1892, 21.50)
and then when I do:
price(favorite)
It gives me an error:
for item in rlist:
TypeError: 'type' object is not iterable
Perhaps use
def price(rlist):
return [item.price for item in rlist]
to return what you seem to want
You can use list comprehension with string.join method.
>>> from collections import namedtuple
>>> Restaurant = namedtuple('Restaurant', 'name cuisine phone dish price')
>>> RC = [Restaurant("Thai Dishes", "Thai", "334-4433", "Mee Krob", 12.50), Restaurant("Nobu", "Japanese", "335-4433", "Natto Temaki", 5.50)]
>>> def price(a_tuple):
... return ', '.join([str(rc.price) for rc in a_tuple])
...
>>> price(RC)
'12.5, 5.5'
The method join return a string with elements of an iterable.
To solve the problem of only one namedtuple (and not a list of they) you can check if the argument has an attribute _fields (namedtuples have this attribute)
>>> def price(a_tuple):
... if hasattr(a_tuple, '_fields'):
... return a_tuple.price
... return ', '.join([str(rc.price) for rc in a_tuple])
...
>>> price(RC)
'12.5, 5.5'
>>> price(RC[0])
12.5
This solution working with any iterable types (list, tuple, iterator, etc)

Postgres JSON data type Rails query

I am using Postgres' json data type but want to do a query/ordering with data that is nested within the json.
I want to order or query with .where on the json data type. For example, I want to query for users that have a follower count > 500 or I want to order by follower or following count.
Thanks!
Example:
model User
data: {
"photos"=>[
{"type"=>"facebook", "type_id"=>"facebook", "type_name"=>"Facebook", "url"=>"facebook.com"}
],
"social_profiles"=>[
{"type"=>"vimeo", "type_id"=>"vimeo", "type_name"=>"Vimeo", "url"=>"http://vimeo.com/", "username"=>"v", "id"=>"1"},
{"bio"=>"I am not a person, but a series of plants", "followers"=>1500, "following"=>240, "type"=>"twitter", "type_id"=>"twitter", "type_name"=>"Twitter", "url"=>"http://www.twitter.com/", "username"=>"123", "id"=>"123"}
]
}
For any who stumbles upon this. I have come up with a list of queries using ActiveRecord and Postgres' JSON data type. Feel free to edit this to make it more clear.
Documentation to the JSON operators used below: https://www.postgresql.org/docs/current/functions-json.html.
# Sort based on the Hstore data:
Post.order("data->'hello' DESC")
=> #<ActiveRecord::Relation [
#<Post id: 4, data: {"hi"=>"23", "hello"=>"22"}>,
#<Post id: 3, data: {"hi"=>"13", "hello"=>"21"}>,
#<Post id: 2, data: {"hi"=>"3", "hello"=>"2"}>,
#<Post id: 1, data: {"hi"=>"2", "hello"=>"1"}>]>
# Where inside a JSON object:
Record.where("data ->> 'likelihood' = '0.89'")
# Example json object:
r.column_data
=> {"data1"=>[1, 2, 3],
"data2"=>"data2-3",
"array"=>[{"hello"=>1}, {"hi"=>2}],
"nest"=>{"nest1"=>"yes"}}
# Nested search:
Record.where("column_data -> 'nest' ->> 'nest1' = 'yes' ")
# Search within array:
Record.where("column_data #>> '{data1,1}' = '2' ")
# Search within a value that's an array:
Record.where("column_data #> '{array,0}' ->> 'hello' = '1' ")
# this only find for one element of the array.
# All elements:
Record.where("column_data ->> 'array' LIKE '%hello%' ") # bad
Record.where("column_data ->> 'array' LIKE ?", "%hello%") # good
According to this http://edgeguides.rubyonrails.org/active_record_postgresql.html#json
there's a difference in using -> and ->>:
# db/migrate/20131220144913_create_events.rb
create_table :events do |t|
t.json 'payload'
end
# app/models/event.rb
class Event < ActiveRecord::Base
end
# Usage
Event.create(payload: { kind: "user_renamed", change: ["jack", "john"]})
event = Event.first
event.payload # => {"kind"=>"user_renamed", "change"=>["jack", "john"]}
## Query based on JSON document
# The -> operator returns the original JSON type (which might be an object), whereas ->> returns text
Event.where("payload->>'kind' = ?", "user_renamed")
So you should try Record.where("data ->> 'status' = 200 ") or the operator that suits your query (http://www.postgresql.org/docs/current/static/functions-json.html).
Your question doesn't seem to correspond to the data you've shown, but if your table is named users and data is a field in that table with JSON like {count:123}, then the query
SELECT * WHERE data->'count' > 500 FROM users
will work. Take a look at your database schema to make sure you understand the layout and check that the query works before complicating it with Rails conventions.
JSON filtering in Rails
Event.create( payload: [{ "name": 'Jack', "age": 12 },
{ "name": 'John', "age": 13 },
{ "name": 'Dohn', "age": 24 }]
Event.where('payload #> ?', '[{"age": 12}]')
#You can also filter by name key
Event.where('payload #> ?', '[{"name": "John"}]')
#You can also filter by {"name":"Jack", "age":12}
Event.where('payload #> ?', {"name":"Jack", "age":12}.to_json)
You can find more about this here