How to index JSON data in PostgreSQL 9.2? - json

Does anyone know how to create index on JSON data in PostgreSQL 9.2?
Example data:
[
{"key" : "k1", "value" : "v1"},
{"key" : "k2", "value" : "v2"}
]
Say if I want to index on all the keys how to do that?
Thanks.

You are much better off using hstore for indexed fields, at least for now.
CREATE INDEX table_name_gin_data ON table_name USING GIN(data);
You can also create GIST indexes if you are interested in fulltext search. More info here: http://www.postgresql.org/docs/9.0/static/textsearch-indexes.html

Currently there are no built-in functions to index JSON directly. But you can do it with a function based index where the function is written in JavaScript.
See this blog post for details: http://people.planetpostgresql.org/andrew/index.php?/archives/249-Using-PLV8-to-index-JSON.html
There is another blog post from which talks about JSON and how it can be used with JavaScript: http://www.postgresonline.com/journal/archives/272-Using-PLV8-to-build-JSON-selectors.html

This question is a little old but I think the selected answer is not really the ideal one. To index json (the property values inside json text), we can use expression indexes with PLV8 (suggested by #a_horse_with_no_name).
Craig Kerstein does a great job of explaining/demonstrating:
http://www.craigkerstiens.com/2013/05/29/postgres-indexes-expression-or-functional-indexes/

Related

Searching trough JSON data stored in MySql

I've a big set of json data inside a database, the data wasn't supposed to be queryed, so the was stored in a very messy way... this is the structure
{
"0": {"key": "Developer(s)", "values": ["Capcom"]},
"1": {"key": "Publisher(s)", "values": ["Capcom"]},
"2": {"key": "Producer(s)", "values": ["Tokuro Fujiwara"]},
"3": {"key": "Composer(s)", "values": ["Setsuo Yamamoto"]},
"4": {"key": "Series", "values": ["X-Men"]},
"6": {"key": "Release", "values": ["EU:", " 1995"]},
"7": {"key": "Mode(s)", "values": ["Single-player"]}
}
I should query inside the db to verify which records has which property (i.e. all records with a "Release" key inside, all that contains the value "Capcom" inside the Developoer key, etc.)
Can someone point me to the right way? I found only examples with simple structures (i.e. { "key": "value" }), here the key is the index number, and the value is an array with two different keys...
Should I find a way to rewrite all the data or there is something easy?
p.s. I'm building a laravel application over this data, so I can also use an eloquent approach.
Thanks in advance
as #Guy L mentioned in his answer, you can use LIKE or REGEX. But, It will be expensive.
example:
SELECT * FROM table WHERE json_column LIKE '%"Release"%';
answering:
Should I find a way to rewrite all the data or there is something easy?
Consider, how often do you have to access this data?
a NoSQL database like MongoDB is a really good usecase for data like this, I have been using Mongo and I am happy with it.
You can easily migrate your data to MongoDB and use a ORM similar to Eloquent model like : https://github.com/jenssegers/laravel-mongodb to communicate with mongo from your laravel project.
Hope it helps you arrive at a solution.
Please refer to
https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html
which allows you to search for specific value at specific place of the JSON structure.
However, I would suggest - if the data is NOW supposed to be searchable - redesign/convert the data.
Query JSON as a string can be done using the LIKE or REGEXP operators
But in general, since the strings are long an complex it's really not recommended
The best way is reloading the info into proper tables, storing in an SQL way
If your MySQL version is 5.7 or above, you can try some options will its JSON support:
https://dev.mysql.com/doc/refman/8.0/en/json.html#json-paths
Thanks for the suggestions about using mysql json specific functions, the column is already in JSON type, so i can definitively use thm, but unfortunally even after reading the mysql documentation i can't figure out how to solve my problem, honestly i'm not a data specialist, i'm kind of confused when dealing with database, so any examples will be appreciated.
So far i've tried to mess up with queries, but i wasn't able to find a correct "selector" for the keys of my dataset, using '$[0]' will return only the first column, i'd need some hints for creating the right syntax using json_extract, json_contains, etc.
Thanks everyone, at the end i've decided to fetch all the data and storing them again properly.

How to validate against runtime JSON object reference?

For a sample JSON data which looks like this -
{
"children":{
"Alice":{...},
"Jamie":{...},
"Bob":{...}
// Any new child with a given unique name will be added to this object
},
childrenOrder:["Alice", "Bob", "Jamie"]
}
In the corresponding JSON Schema, I am trying to limit the valid values in "childrenOrder" array to be from the run time children keys.
I didn't see any means of referring to runtime dynamic values in the official JSON Schema documentation (http://json-schema.org/documentation.html).
Is this even possible at the moment?
For the sake of brevity I omitted JSON Schema code. I can add it if folks think it is needed to address the question.
Thanks in advance.
No it is not possible using the current JSON Schema specification. However, there is a proposal for the next version of JSON Schema that could change that.
https://github.com/json-schema/json-schema/wiki/%24data-(v5-proposal)

Schemaless Support for Elastic Search Queries

Our REST API allows users to add custom schemaless JSON to some of our REST resources, and we need it to be searchable in Elasticsearch. This custom data and its structure can be completely different across resources of the same type.
Consider this example document:
{
"givenName": "Joe",
"username": "joe",
"email": "joe#mailinator.com",
"customData": {
"favoriteColor": "red",
"someObject": {
"someKey": "someValue"
}
}
}
All fields except customData adhere to a schema. customData is always a JSON Object, but all the fields and values within that Object can vary dramatically from resource to resource. There is no guarantee that any given field name or value (or even value type) within customData is the same across any two resources as users can edit these fields however they wish.
What is the best way to support search for this?
We thought a solution would be to just not create any mapping for customData when the index is created, but then it becomes unqueryable (which is contrary to what the ES docs say). This would be the ideal solution if queries on non-mapped properties worked, and there were no performance problems with this approach. However, after running multiple tests for that matter we haven’t been able to get that to work.
Is this something that needs any special configuration? Or are the docs incorrect? Some clarification as to why it is not working would be greatly appreciated.
Since this is not currently working for us, we’ve thought of a couple alternative solutions:
Reindexing: this would be costly as we would need to reindex every index that contains that document and do so every time a user updates a property with a different value type. Really bad for performance, so this is likely not a real option.
Use multi-match query: we would do this by appending a random string to the customData field name every time there is a change in the customData object. For example, this is what the document being indexed would look like:
{
"givenName": "Joe",
"username": "joe",
"email": "joe#mailinator.com",
"customData_03ae8b95-2496-4c8d-9330-6d2058b1bbb9": {
"favoriteColor": "red",
"someObject": {
"someKey": "someValue"
}
}
}
This means ES would create a new mapping for each ‘random’ field, and we would use phrase multi-match query using a "starts with" wild card for the field names when performing the queries. For example:
curl -XPOST 'eshost:9200/test/_search?pretty' -d '
{
"query": {
"multi_match": {
"query" : "red",
"type" : "phrase",
"fields" : ["customData_*.favoriteColor"]
}
}
}'
This could be a viable solution, but we are concerned that having too many mappings like this could affect performance. Are there any performance repercussions for having too many mappings on an index? Maybe periodic reindexing could alleviate having too many mappings?
This also just feels like a hack and something that should be handled by ES natively. Am I missing something?
Any suggestions about any of this would be much appreciated.
Thanks!
You're correct that Elasticsearch is not truly schemaless. If no mapping is specified, Elasticsearch infers field type primitives based upon the first value it sees for that field. Therefore your non-deterministic customData object can get you in trouble if you first see "favoriteColor": 10 followed by "favoriteColor": "red".
For your requirements, you should take a look at SIREn Solutions Elasticsearch plugin which provides a schemaless solution coupled with an advanced query language (using Twig) and a custom Lucene index format to speed up indexing and search operations for non-deterministic data.
Fields with same mapping will be stored as same lucene field in the lucene index (Elasticsearch shard). Different lucene field will have separate inverted index (term dict and index entry) and separate doc values. Lucene is highly optimized to store documents of same field in a compressed way. Using a mapping with different field for different document prevent lucene from doing its optimization.
You should use Elasticsearch Nested Document to search efficiently. The underlying technology is Lucene BlockJoin, which indexes parent/child documents as a document block.

Postgres json type inner Query [duplicate]

I am looking for some docs and/or examples for the new JSON functions in PostgreSQL 9.2.
Specifically, given a series of JSON records:
[
{name: "Toby", occupation: "Software Engineer"},
{name: "Zaphod", occupation: "Galactic President"}
]
How would I write the SQL to find a record by name?
In vanilla SQL:
SELECT * from json_data WHERE "name" = "Toby"
The official dev manual is quite sparse:
http://www.postgresql.org/docs/devel/static/datatype-json.html
http://www.postgresql.org/docs/devel/static/functions-json.html
Update I
I've put together a gist detailing what is currently possible with PostgreSQL 9.2.
Using some custom functions, it is possible to do things like:
SELECT id, json_string(data,'name') FROM things
WHERE json_string(data,'name') LIKE 'G%';
Update II
I've now moved my JSON functions into their own project:
PostSQL - a set of functions for transforming PostgreSQL and PL/v8 into a totally awesome JSON document store
Postgres 9.2
I quote Andrew Dunstan on the pgsql-hackers list:
At some stage there will possibly be some json-processing (as opposed
to json-producing) functions, but not in 9.2.
Doesn't prevent him from providing an example implementation in PLV8 that should solve your problem. (Link is dead now, see modern PLV8 instead.)
Postgres 9.3
Offers an arsenal of new functions and operators to add "json-processing".
The manual on new JSON functionality.
The Postgres Wiki on new features in pg 9.3.
The answer to the original question in Postgres 9.3:
SELECT *
FROM json_array_elements(
'[{"name": "Toby", "occupation": "Software Engineer"},
{"name": "Zaphod", "occupation": "Galactic President"} ]'
) AS elem
WHERE elem->>'name' = 'Toby';
Advanced example:
Query combinations with nested array of records in JSON datatype
For bigger tables you may want to add an expression index to increase performance:
Index for finding an element in a JSON array
Postgres 9.4
Adds jsonb (b for "binary", values are stored as native Postgres types) and yet more functionality for both types. In addition to expression indexes mentioned above, jsonb also supports GIN, btree and hash indexes, GIN being the most potent of these.
The manual on json and jsonb data types and functions.
The Postgres Wiki on JSONB in pg 9.4
The manual goes as far as suggesting:
In general, most applications should prefer to store JSON data as
jsonb, unless there are quite specialized needs, such as legacy
assumptions about ordering of object keys.
Bold emphasis mine.
Performance benefits from general improvements to GIN indexes.
Postgres 9.5
Complete jsonb functions and operators. Add more functions to manipulate jsonb in place and for display.
Major good news in the release notes of Postgres 9.5.
With Postgres 9.3+, just use the -> operator. For example,
SELECT data->'images'->'thumbnail'->'url' AS thumb FROM instagram;
see http://clarkdave.net/2013/06/what-can-you-do-with-postgresql-and-json/ for some nice examples and a tutorial.
With postgres 9.3 use -> for object access. 4 example
seed.rb
se = SmartElement.new
se.data =
{
params:
[
{
type: 1,
code: 1,
value: 2012,
description: 'year of producction'
},
{
type: 1,
code: 2,
value: 30,
description: 'length'
}
]
}
se.save
rails c
SELECT data->'params'->0 as data FROM smart_elements;
returns
data
----------------------------------------------------------------------
{"type":1,"code":1,"value":2012,"description":"year of producction"}
(1 row)
You can continue nesting
SELECT data->'params'->0->'type' as data FROM smart_elements;
return
data
------
1
(1 row)

partial update json field in postgres

In Postgres I have a table like this:
CREATE TABLE storehouse
(
user_id bigint NOT NULL,
capacity integer NOT NULL,
storehouse json NOT NULL,
last_modified timestamp without time zone NOT NULL,
CONSTRAINT storehouse_pkey PRIMARY KEY (user_id)
)
And storehouse.storehouse is storing data like this:
{
"slots":[
{
"slot" : 1,
"id" : 938
},
{
"slot" : 2,
"id" : 127
},
]
}
The thing is, I want to update storehouse.storehouse.slots[2], but I do not have an idea on how to do it.
I know how to alter the entire storehouse.storehouse field, but I am wondering since Postgres supports json type, it should support partial modify, otherwise that would be no difference between json type and text type. (I know json type also has type validation which is differ to text)
JSON indexing and partial updates are not currently supported. The JSON support in PostgreSQL 9.2 is rudimentary, limited to validating JSON and to converting rows and arrays to JSON. Internally, json is indeed pretty much just text.
There's ongoing work for enhancements like partial updates,indexing, etc. No matter what, though, PostgreSQL won't be able to avoid rewriting the whole row when part of a JSON value changes, because that's inherent to the MVCC model of concurrency. The only way to make that possible would be to split JSON values out into multiple tuples in a side relation, like TOAST tables - something that's possible, but likely to perform poorly and that's very far from being considered at this point.
As Chris Travers points out, you can use PL/V8 functions or functions in other languages with json support like Perl or Python to extract values, then create expression indexes on those functions.
Since PostgreSQL 9.5, there a function called jsonb_set which takes as input parameters:
a JSON object
an array indicating the path (keys and subkeys)
the new value to be stored (also a JSON object)
Example:
# SELECT jsonb_set('{"name": "James", "contact": {"phone": "01234 567890", "fax": "01987 543210"}}'::jsonb,
'{contact,phone}',
'"07900 112233"'::jsonb);
jsonb_replace
--------------------------------------------------------------------------------
{"name": "James", "contact": {"fax": "01987 543210", "phone": "07900 112233"}}
(1 row)