In Postgres I have a table like this:
CREATE TABLE storehouse
(
user_id bigint NOT NULL,
capacity integer NOT NULL,
storehouse json NOT NULL,
last_modified timestamp without time zone NOT NULL,
CONSTRAINT storehouse_pkey PRIMARY KEY (user_id)
)
And storehouse.storehouse is storing data like this:
{
"slots":[
{
"slot" : 1,
"id" : 938
},
{
"slot" : 2,
"id" : 127
},
]
}
The thing is, I want to update storehouse.storehouse.slots[2], but I do not have an idea on how to do it.
I know how to alter the entire storehouse.storehouse field, but I am wondering since Postgres supports json type, it should support partial modify, otherwise that would be no difference between json type and text type. (I know json type also has type validation which is differ to text)
JSON indexing and partial updates are not currently supported. The JSON support in PostgreSQL 9.2 is rudimentary, limited to validating JSON and to converting rows and arrays to JSON. Internally, json is indeed pretty much just text.
There's ongoing work for enhancements like partial updates,indexing, etc. No matter what, though, PostgreSQL won't be able to avoid rewriting the whole row when part of a JSON value changes, because that's inherent to the MVCC model of concurrency. The only way to make that possible would be to split JSON values out into multiple tuples in a side relation, like TOAST tables - something that's possible, but likely to perform poorly and that's very far from being considered at this point.
As Chris Travers points out, you can use PL/V8 functions or functions in other languages with json support like Perl or Python to extract values, then create expression indexes on those functions.
Since PostgreSQL 9.5, there a function called jsonb_set which takes as input parameters:
a JSON object
an array indicating the path (keys and subkeys)
the new value to be stored (also a JSON object)
Example:
# SELECT jsonb_set('{"name": "James", "contact": {"phone": "01234 567890", "fax": "01987 543210"}}'::jsonb,
'{contact,phone}',
'"07900 112233"'::jsonb);
jsonb_replace
--------------------------------------------------------------------------------
{"name": "James", "contact": {"fax": "01987 543210", "phone": "07900 112233"}}
(1 row)
Related
I want to use an ID as the primary key in a JSON object. This way all users in the list are unique.
Like so:
{
"user": [{
"id": 1,
"name": "bob"
}]
}
In an application, I have to search for the id in all elements of the list 'user'.
But I can also use the ID as an index to get easier access to a specific user.
Like so:
{
"user": {
"1": {
"name": "bob"
}
}
}
In an application, I can now simply write user["3"] to get the correct user.
What should I use? Are there any disadvantages to the second option? I'm sure there is a best practice.
It depends on what format you want objects to look like, how much processing you want to do on your objects and how much data you have.
When dealing with web data you will often see the first format. If there is a lot of data then you will need to iterate through all records to find your matching id because your data is an array. Often that query would be enforced on your lower level data set though so it might already be indexed (eg. if it is a database) so this may not be an issue. This format is clean and binds easily.
Your second option works best when you need efficiency in your lookups since you have a dictionary with key value pairs allowing for significantly faster lookups in large datasets. Putting a numeric key (even though you are forcing it to be a string) is not supported by all libraries. You can prefix your Id with an alpha value though, then you can just add the prefix when doing a lookup. I have used k in this example but you can choose a prefix that makes sense for your data. I use this format when storing objects as the json binary data type in databases.
{
"user": {
"k1": {
"name": "bob"
}
}
}
I am trying to parse below JSON using YAJL. YAJLGEN generated below data structure but the issue i am facing is the number of arrays ex: KEY, CUSTOMER are not fixed. These arrays are returned for each field in the response. I am trying to avoid defining an array for each field from the response.
Could you please, advise if there is a better way to read the below json and parse dyanic arrays. I tried using "yajl_array_loop", "yajl_array_elem" but i couldn't able to make it work in my program for some reason. Thank is in advance.
{
"errstatus": 400,
"errors": {
"Key": [
"The Key field is required."
],
"Customer": [
"The Customer field is required."
]
}
}
dcl-ds jsonDoc qualified;
errstatus packed(3) inz(0);
dcl-ds ERRORS;
num_KEY int(10) inz(0);
KEY varchar(37) inz('') dim(1);
num_CUSTOMER int(10) inz(0);
CUSTOMER varchar(43) inz('') dim(2);
end-ds;
end-ds;
If yajl is not working then it is probably not a good choice for your case. If your JSON is not hundreds of megabyte big then you may try a DOM like approach like using noxDB (https://github.com/sitemule/noxDB). It reads the whole JSON into memory and you can evaluate the in-memory JSON the way you want. Seems like a much better approach for your situation.
This question already has answers here:
How to use if statement inside JSON?
(6 answers)
Closed 5 years ago.
I want to include an if-else condition in JSON based on which I need to set an attribute in the JSON file.
For example like this:
"identifier": "navTag",
"items": [{
"label": "abc",
"url": "yxz.com",
},
{
"label": "abc1",
"url": "yxz1.com",
},
{
"label": "abc2",
"url": "yxz2.com",/*I need to change this value on certain
condition like if condition is true then
"url": xyz2.com if false "url":xyz3.com*/
}
]
Is this possible?
JSON is a structure for storing data so that we can retrieved it much faster comparative to other data structure.So we can not give some conditions here.If you want to retrieve some data according to some if-else condition then there is two possible way,
1.We can create different JSON files for different conditions.
2.We can create two field in your JSON structure called if and else.If if condition satisfied then fetch the if field's value and if else satisfied then retrieved the else field's value.
eg:
{
"if":"if-value",
"else":"else-value"
}
JSON is only a data representation (unrelated to any programming language, even if early JavaScript implementations remotely inspired it). There is no notion of "execution" or "conditional" (or of "behavior" or of "semantics") in it.
Read carefully the (short) JSON definition. It simply defines what sequence of characters (e.g. the content of a file) is valid JSON. It does not define the "meaning" of JSON data.
JSON data is parsed by some program, and emitted by some program (often different ones, but could be the same).
The program handling JSON can of course use conditions and give some "meaning" (whatever is the definition of that word) to it. But JSON is only "data syntax".
You could (easily) write your own JSON transformer (using some existing JSON library, and there are many of them), and that is really simple. Some programs (notably jq) claim to be more or less generic JSON processors.
Since JSON is a textual format, you could even use some editor (such as emacs, vim or many others) to manually change parts of it. You'll better validate the result with some existing JSON parser (to be sure you did not add any mistakes).
While developing a client application using one of our existing REST services, I have the choice for using JSON or XML responses. The XML responses are described by XSD files with schema information.
With these XML Schemas I can determine what datatype a certain result must be, and the client can use that information when presenting the data to the user, or when the client asks the user to change a property. (How is quit another question btw as I cannot find any multiplatform Delphi implementation of XML that supports XSD schemas... but like i said: that's another question).
The alternative is to use a JSON response type, but then the client cannot determine the specific datatype of a property because everything is send as a string.
How would a client know that one of those properties is a index from an enumerated type, or a integer number, or an amount or a reference to another object by its ID maybe? (These are just examples)
I would think that the client should not contain "hardcoded" info on the structure of the response, or am I wrong in assuming that?
JSON doesn't have a rich type system like XML does, and JSON doesn't have a schema system for describing things like enumerations and references like XML does. But JSON has only a few data types, and the general formatting of the JSON is self-describing in terms of what data type any given value is using (see the official JSON spec for more details):
a string is always wrapped in quotation marks:
"fieldname": "fieldvalue"
a numeric value is digit characters without quotations:
"fieldname": 12345
an object is always wrapped in curly braces:
"fieldname": { ... object data ... }
an array is always wrapped in square braces:
"fieldname": [ ... array data ... ]
a boolean is always a fixed true or false without quotations:
"name": true
"name": false
a null is always a fixed null without quotations:
"name": null
Anything beyond that will require the client to have external knowledge of the data that is being sent (like a schema in XML, since XML itself does not describe data types at all).
Our REST API allows users to add custom schemaless JSON to some of our REST resources, and we need it to be searchable in Elasticsearch. This custom data and its structure can be completely different across resources of the same type.
Consider this example document:
{
"givenName": "Joe",
"username": "joe",
"email": "joe#mailinator.com",
"customData": {
"favoriteColor": "red",
"someObject": {
"someKey": "someValue"
}
}
}
All fields except customData adhere to a schema. customData is always a JSON Object, but all the fields and values within that Object can vary dramatically from resource to resource. There is no guarantee that any given field name or value (or even value type) within customData is the same across any two resources as users can edit these fields however they wish.
What is the best way to support search for this?
We thought a solution would be to just not create any mapping for customData when the index is created, but then it becomes unqueryable (which is contrary to what the ES docs say). This would be the ideal solution if queries on non-mapped properties worked, and there were no performance problems with this approach. However, after running multiple tests for that matter we haven’t been able to get that to work.
Is this something that needs any special configuration? Or are the docs incorrect? Some clarification as to why it is not working would be greatly appreciated.
Since this is not currently working for us, we’ve thought of a couple alternative solutions:
Reindexing: this would be costly as we would need to reindex every index that contains that document and do so every time a user updates a property with a different value type. Really bad for performance, so this is likely not a real option.
Use multi-match query: we would do this by appending a random string to the customData field name every time there is a change in the customData object. For example, this is what the document being indexed would look like:
{
"givenName": "Joe",
"username": "joe",
"email": "joe#mailinator.com",
"customData_03ae8b95-2496-4c8d-9330-6d2058b1bbb9": {
"favoriteColor": "red",
"someObject": {
"someKey": "someValue"
}
}
}
This means ES would create a new mapping for each ‘random’ field, and we would use phrase multi-match query using a "starts with" wild card for the field names when performing the queries. For example:
curl -XPOST 'eshost:9200/test/_search?pretty' -d '
{
"query": {
"multi_match": {
"query" : "red",
"type" : "phrase",
"fields" : ["customData_*.favoriteColor"]
}
}
}'
This could be a viable solution, but we are concerned that having too many mappings like this could affect performance. Are there any performance repercussions for having too many mappings on an index? Maybe periodic reindexing could alleviate having too many mappings?
This also just feels like a hack and something that should be handled by ES natively. Am I missing something?
Any suggestions about any of this would be much appreciated.
Thanks!
You're correct that Elasticsearch is not truly schemaless. If no mapping is specified, Elasticsearch infers field type primitives based upon the first value it sees for that field. Therefore your non-deterministic customData object can get you in trouble if you first see "favoriteColor": 10 followed by "favoriteColor": "red".
For your requirements, you should take a look at SIREn Solutions Elasticsearch plugin which provides a schemaless solution coupled with an advanced query language (using Twig) and a custom Lucene index format to speed up indexing and search operations for non-deterministic data.
Fields with same mapping will be stored as same lucene field in the lucene index (Elasticsearch shard). Different lucene field will have separate inverted index (term dict and index entry) and separate doc values. Lucene is highly optimized to store documents of same field in a compressed way. Using a mapping with different field for different document prevent lucene from doing its optimization.
You should use Elasticsearch Nested Document to search efficiently. The underlying technology is Lucene BlockJoin, which indexes parent/child documents as a document block.