Postgres JSON data type Rails query - json

I am using Postgres' json data type but want to do a query/ordering with data that is nested within the json.
I want to order or query with .where on the json data type. For example, I want to query for users that have a follower count > 500 or I want to order by follower or following count.
Thanks!
Example:
model User
data: {
"photos"=>[
{"type"=>"facebook", "type_id"=>"facebook", "type_name"=>"Facebook", "url"=>"facebook.com"}
],
"social_profiles"=>[
{"type"=>"vimeo", "type_id"=>"vimeo", "type_name"=>"Vimeo", "url"=>"http://vimeo.com/", "username"=>"v", "id"=>"1"},
{"bio"=>"I am not a person, but a series of plants", "followers"=>1500, "following"=>240, "type"=>"twitter", "type_id"=>"twitter", "type_name"=>"Twitter", "url"=>"http://www.twitter.com/", "username"=>"123", "id"=>"123"}
]
}

For any who stumbles upon this. I have come up with a list of queries using ActiveRecord and Postgres' JSON data type. Feel free to edit this to make it more clear.
Documentation to the JSON operators used below: https://www.postgresql.org/docs/current/functions-json.html.
# Sort based on the Hstore data:
Post.order("data->'hello' DESC")
=> #<ActiveRecord::Relation [
#<Post id: 4, data: {"hi"=>"23", "hello"=>"22"}>,
#<Post id: 3, data: {"hi"=>"13", "hello"=>"21"}>,
#<Post id: 2, data: {"hi"=>"3", "hello"=>"2"}>,
#<Post id: 1, data: {"hi"=>"2", "hello"=>"1"}>]>
# Where inside a JSON object:
Record.where("data ->> 'likelihood' = '0.89'")
# Example json object:
r.column_data
=> {"data1"=>[1, 2, 3],
"data2"=>"data2-3",
"array"=>[{"hello"=>1}, {"hi"=>2}],
"nest"=>{"nest1"=>"yes"}}
# Nested search:
Record.where("column_data -> 'nest' ->> 'nest1' = 'yes' ")
# Search within array:
Record.where("column_data #>> '{data1,1}' = '2' ")
# Search within a value that's an array:
Record.where("column_data #> '{array,0}' ->> 'hello' = '1' ")
# this only find for one element of the array.
# All elements:
Record.where("column_data ->> 'array' LIKE '%hello%' ") # bad
Record.where("column_data ->> 'array' LIKE ?", "%hello%") # good

According to this http://edgeguides.rubyonrails.org/active_record_postgresql.html#json
there's a difference in using -> and ->>:
# db/migrate/20131220144913_create_events.rb
create_table :events do |t|
t.json 'payload'
end
# app/models/event.rb
class Event < ActiveRecord::Base
end
# Usage
Event.create(payload: { kind: "user_renamed", change: ["jack", "john"]})
event = Event.first
event.payload # => {"kind"=>"user_renamed", "change"=>["jack", "john"]}
## Query based on JSON document
# The -> operator returns the original JSON type (which might be an object), whereas ->> returns text
Event.where("payload->>'kind' = ?", "user_renamed")
So you should try Record.where("data ->> 'status' = 200 ") or the operator that suits your query (http://www.postgresql.org/docs/current/static/functions-json.html).

Your question doesn't seem to correspond to the data you've shown, but if your table is named users and data is a field in that table with JSON like {count:123}, then the query
SELECT * WHERE data->'count' > 500 FROM users
will work. Take a look at your database schema to make sure you understand the layout and check that the query works before complicating it with Rails conventions.

JSON filtering in Rails
Event.create( payload: [{ "name": 'Jack', "age": 12 },
{ "name": 'John', "age": 13 },
{ "name": 'Dohn', "age": 24 }]
Event.where('payload #> ?', '[{"age": 12}]')
#You can also filter by name key
Event.where('payload #> ?', '[{"name": "John"}]')
#You can also filter by {"name":"Jack", "age":12}
Event.where('payload #> ?', {"name":"Jack", "age":12}.to_json)
You can find more about this here

Related

icontains does not match any items in the database

This is my data:
>>> print(MyModel.objects.get(id=1).Fruits) #Fruits is JSONField
>>> print(favorites)
{"Title": ["Fruits"], "Name": ["Banana", "Cherry", "Apple", "Peach"], "Other":["Banana"]}
I define a query as follows:
>>> query = reduce(operator.or_, (Q(Fruits__Name__icontains=x) for x in favorites))
#or: query = reduce(operator.or_, (Q(Fruits__icontains={'Name':x}) for x in favorites))
>>> print(query)
(OR: ('Fruits__Name__icontains', 'Apple'), ('Fruits__Name__icontains', 'Banana'))
I want it return that item banana or an apple are in Name.
When I run this query (postgresql):
MyModel.objects.filter(query)
it doesn't match any item in the database.
With MyModel.Fruits as your JSONField, depending on the database used, it might be stored/accessed as string. You could try just accessing icontains directly on the JSON string value:
Fruits__icontains=x
So full query would be:
query = reduce(operator.or_, (Q(Fruits__icontains=x) for x in favorites))

Querying to parent and children to a JSON format from MySQL 5.6?

I have a heirarchy of tables in a MySQL 5.6 database that I need to query to a JSON format for use by a javascript tree structure.
Just as a test in my flask I did the following for just the top level
def get_all_customers():
response_object = {'status': 'success'}
cnx = mysql.connector.connect(user="", password="", database="", host="localhost", port=3306)
cursor = cnx.cursor()
cursor.execute('SELECT idx, name FROM listcustomers ORDER BY name')
data = []
for idx, name in cursor:
data.append({'id': idx, 'label':name, 'otherProp': "Customer"})
response_object['customers'] = data
return jsonify(response_object)
which returns
[
{ id: 1,
label: "customer 1",
otherProp: "Customer"
},
...
]
But each customer has locations, and each location has areas, and each area has assets, and each asset has projects, and I need to also query them into children of this json object. So, for example, just going one level deeper to locations, I would need something like this -
[
{ id: 1,
label: "customer 1",
otherProp: "Customer",
children: [
{
id: 5,
label: "location 5",
otherProp: "Location"
},
...
]
},
...
]
where in my database listlocatiosn who links to listcustomers via the it's parentCustomerId column. How can I manage this? Eventually this tree will have about 13,000 objects so I know just querying the data and then parsing it with python would be far more inefficient than if I am able to query properly to begin with.

Replace and access values in nested hash/json by path in Ruby

Asking for a advice what would be in your opinion best and simple solution to replace and access values in nested hash or json by path ir variable using ruby?
For example imagine I have json or hash with this kind of structure:
{
"name":"John",
"address":{
"street":"street 1",
"country":"country1"
},
"phone_numbers":[
{
"type":"mobile",
"number":"234234"
},
{
"type":"fixed",
"number":"2342323423"
}
]
}
And I would like to access or change fixed mobile number by path which could be specified in variable like this: "phone_numbers/1/number" (separator does not matter in this case)
This solution is necessary to retrieve values from json/hash and sometimes replace variables by specifying path to it. Found some solutions which can find value by key, but this solution wouldn't work as there is some hashes/json where key name is same in multiple places.
I saw this one: https://github.com/chengguangnan/vine , but it does not work when payload is like this as it is not kinda hash in this case:
[
{
"value":"test1"
},
{
"value":"test2"
}
]
Hope you have some great ideas how to solve this problem.
Thank you!
EDIT:
So I tried code below with this data:
x = JSON.parse('[
{
"value":"test1"
},
{
"value":"test2"
}
]')
y = JSON.parse('{
"name":"John",
"address":{
"street":"street 1",
"country":"country1"
},
"phone_numbers":[
{
"type":"mobile",
"number":"234234"
},
{
"type":"fixed",
"number":"2342323423"
}
]
}')
p x
p y.to_h
p x.get_at_path("0/value")
p y.get_at_path("name")
And got this:
[{"value"=>"test1"}, {"value"=>"test2"}]
{"name"=>"John", "address"=>{"street"=>"street 1", "country"=>"country1"}, "phone_numbers"=>[{"type"=>"mobile", "number"=>"234234"}, {"type"=>"fixed", "number"=>"2342323423"}]}
hash_new.rb:91:in `<main>': undefined method `get_at_path' for [{"value"=>"test1"}, {"value"=>"test2"}]:Array (NoMethodError)
For y.get_at_path("name") got nil
You can make use of Hash.dig to get the sub-values, it'll keep calling dig on the result of each step until it reaches the end, and Array has dig as well, so when you reach that array things will keep working:
# you said the separator wasn't important, so it can be changed up here
SEPERATOR = '/'.freeze
class Hash
def get_at_path(path)
dig(*steps_from(path))
end
def replace_at_path(path, new_value)
*steps, leaf = steps_from path
# steps is empty in the "name" example, in that case, we are operating on
# the root (self) hash, not a subhash
hash = steps.empty? ? self : dig(*steps)
# note that `hash` here doesn't _have_ to be a Hash, but it needs to
# respond to `[]=`
hash[leaf] = new_value
end
private
# the example hash uses symbols as the keys, so we'll convert each step in
# the path to symbols. If a step doesn't contain a non-digit character,
# we'll convert it to an integer to be treated as the index into an array
def steps_from path
path.split(SEPERATOR).map do |step|
if step.match?(/\D/)
step.to_sym
else
step.to_i
end
end
end
end
and then it can be used as such (hash contains your sample input):
p hash.get_at_path("phone_numbers/1/number") # => "2342323423"
p hash.get_at_path("phone_numbers/0/type") # => "mobile"
p hash.get_at_path("name") # => "John"
p hash.get_at_path("address/street") # => "street 1"
hash.replace_at_path("phone_numbers/1/number", "123-123-1234")
hash.replace_at_path("phone_numbers/0/type", "cell phone")
hash.replace_at_path("name", "John Doe")
hash.replace_at_path("address/street", "123 Street 1")
p hash.get_at_path("phone_numbers/1/number") # => "123-123-1234"
p hash.get_at_path("phone_numbers/0/type") # => "cell phone"
p hash.get_at_path("name") # => "John Doe"
p hash.get_at_path("address/street") # => "123 Street 1"
p hash
# => {:name=>"John Doe",
# :address=>{:street=>"123 Street 1", :country=>"country1"},
# :phone_numbers=>[{:type=>"cell phone", :number=>"234234"},
# {:type=>"fixed", :number=>"123-123-1234"}]}

Activerecord query with group on multiple columns returning a hash with array as a key

I wrote an ActiveRecord query to fetch count of some data after grouping by two columns col_a and col_b
result = Sample.where(through: ['col_a', 'col_b'], status: [1, 5]).where("created_at > ?", 1.month.ago).group(:status, :through).count
This returns:
{[1, "col_a"]=>7, [1, "col_b"]=>7, [5, "col_a"]=>4, [5, "col_b"]=>1}
Now my question is, how do I access the values in this hash?
Doing something like results[1, "col_a"] throws an error (wrong no. of arguments).
I know I can do this by writing a loop and extracting the values one by one.
However I want to know if there is a more idiomatic way to access the values, something similar to results[1], maybe?
results[[1, "col_a"]]
# => 7
Four possible ways (I'm sure there are others):
# fetch one value at a time
results[[1, "col_a"]]
# => 7
# fetch all the values
results.values
# => [7, 7, 4, 1]
# loop through keys and values
results.each do |key, value|
puts key
puts value
end
# => [1, "col_a"], 7....
# convert results into a more usable hash
results.map! { |k,v| { k.join("_") => v } }.reduce({}, :merge)
results['1_col_a']
# => 7
Another heavier option, especially if this is a query you will do often, is to wrap the results into a new Ruby object. Then you can parse and use the results in a more idiomatic way and define an accessor simpler than [1,'col_a'].
class SampleGroupResult
attr_reader key, value
def initialize(key, value)
#key = key
#value = value
end
end
results.map { |k,v| SampleGroupResult.new(k,v) }

Logstash indexing JSON arrays

Logstash is awesome. I can send it JSON like this (multi-lined for readability):
{
"a": "one"
"b": {
"alpha":"awesome"
}
}
And then query for that line in kibana using the search term b.alpha:awesome. Nice.
However I now have a JSON log line like this:
{
"different":[
{
"this": "one",
"that": "uno"
},
{
"this": "two"
}
]
}
And I'd like to be able to find this line with a search like different.this:two (or different.this:one, or different.that:uno)
If I was using Lucene directly I'd iterate through the different array, and generate a new search index for each hash within it, but Logstash currently seems to ingest that line like this:
different: {this: one, that: uno}, {this: two}
Which isn't going to help me searching for log lines using different.this or different.that.
Any got any thoughts as to a codec, filter or code change I can make to enable this?
You can write your own filter (copy & paste, rename the class name, the config_name and rewrite the filter(event) method) or modify the current JSON filter (source on Github)
You can find the JSON filter (Ruby class) source code in the following path logstash-1.x.x\lib\logstash\filters named as json.rb. The JSON filter parse the content as JSON as follows
begin
# TODO(sissel): Note, this will not successfully handle json lists
# like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly)
# which won't merge into a hash. If someone needs this, we can fix it
# later.
dest.merge!(JSON.parse(source))
# If no target, we target the root of the event object. This can allow
# you to overwrite #timestamp. If so, let's parse it as a timestamp!
if !#target && event[TIMESTAMP].is_a?(String)
# This is a hack to help folks who are mucking with #timestamp during
# their json filter. You aren't supposed to do anything with
# "#timestamp" outside of the date filter, but nobody listens... ;)
event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc
end
filter_matched(event)
rescue => e
event.tag("_jsonparsefailure")
#logger.warn("Trouble parsing json", :source => #source,
:raw => event[#source], :exception => e)
return
end
You can modify the parsing procedure to modify the original JSON
json = JSON.parse(source)
if json.is_a?(Hash)
json.each do |key, value|
if value.is_a?(Array)
value.each_with_index do |object, index|
#modify as you need
object["index"]=index
end
end
end
end
#save modified json
......
dest.merge!(json)
then you can modify your config file to use the/your new/modified JSON filter and place in \logstash-1.x.x\lib\logstash\config
This is mine elastic_with_json.conf with a modified json.rb filter
input{
stdin{
}
}filter{
json{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
if you want to use your new filter you can configure it with the config_name
class LogStash::Filters::Json_index < LogStash::Filters::Base
config_name "json_index"
milestone 2
....
end
and configure it
input{
stdin{
}
}filter{
json_index{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
Hope this helps.
For a quick and dirty hack, I used the Ruby filter and below code , no need to use the out of box 'json' filter anymore
input {
stdin{}
}
filter {
grok {
match => ["message","(?<json_raw>.*)"]
}
ruby {
init => "
def parse_json obj, pname=nil, event
obj = JSON.parse(obj) unless obj.is_a? Hash
obj = obj.to_hash unless obj.is_a? Hash
obj.each {|k,v|
p = pname.nil?? k : pname
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p,event)
else
p = pname.nil?? k : [pname,k].join('.')
event[p] = v
end
}
end
def parse_json_array obj, i,pname, event
obj = JSON.parse(obj) unless obj.is_a? Hash
pname_ = pname
if obj.is_a? Hash
obj.each {|k,v|
p=[pname_,i,k].join('.')
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p, event)
else
event[p] = v
end
}
else
n = [pname_, i].join('.')
event[n] = obj
end
end
"
code => "parse_json(event['json_raw'].to_s,nil,event) if event['json_raw'].to_s.include? ':'"
}
}
output {
stdout{codec => rubydebug}
}
Test json structure
{"id":123, "members":[{"i":1, "arr":[{"ii":11},{"ii":22}]},{"i":2}], "im_json":{"id":234, "members":[{"i":3},{"i":4}]}}
and this is whats output
{
"message" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"#version" => "1",
"#timestamp" => "2014-07-25T00:06:00.814Z",
"host" => "Leis-MacBook-Pro.local",
"json_raw" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"id" => 123,
"members.0.i" => 1,
"members.0.arr.0.ii" => 11,
"members.0.arr.1.ii" => 22,
"members.1.i" => 2,
"im_json" => 234,
"im_json.0.i" => 3,
"im_json.1.i" => 4
}
The solution I liked is the ruby filter because that requires us to not write another filter. However, that solution creates fields that are on the "root" of JSON and it's hard to keep track of how the original document looked.
I came up with something similar that's easier to follow and is a recursive solution so it's cleaner.
ruby {
init => "
def arrays_to_hash(h)
h.each do |k,v|
# If v is nil, an array is being iterated and the value is k.
# If v is not nil, a hash is being iterated and the value is v.
value = v || k
if value.is_a?(Array)
# "value" is replaced with "value_hash" later.
value_hash = {}
value.each_with_index do |v, i|
value_hash[i.to_s] = v
end
h[k] = value_hash
end
if value.is_a?(Hash) || value.is_a?(Array)
arrays_to_hash(value)
end
end
end
"
code => "arrays_to_hash(event.to_hash)"
}
It converts arrays to has with each key as the index number. More details:- http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html