postgres store reference to field in json - json

It is possible to store json in postgres using the json data type. Check this tutorial for an introduction: http://www.postgresqltutorial.com/postgresql-json/
Consider I am storing the following json in such a field:
{
"address": {
"street1": "123 seasame st"
}
}
I want a to store separately a reference to the street field. For example, I might have another object which is using data from this json structure and wants to store a reference to where it got the data. Maybe something like this:
class Product():
__tablename__ = 'Address'
street_1 = Column(String)
data_source = ?
Now I could make data_source a string and just store namespaces like address.street, but if I did this postgres has no idea what that means. Working with that in queries would mean parsing the string and other inefficient stuff. Does postgres support referring to fields stored inside json data structures?
This question is related to JSON foreign keys in PostgreSQL , but in this case I don't necessarily want a fk relationship. I just want to create a reference, which is not necessarily enforced in the way a fk is.
update:
To be more clear, I want to reference the location of something in the json structure on another attribute and store that reference in a column. In the below code, Address.data_source is a reference to the location of the street data (for example address.street1 in this case)
class Address():
__tablename__ = 'Address'
street_1 = Column(String)
sample_id = Column(Integer, ForeignKey('DataSample.uid'))
data_source = ?
class DataSample():
__tablename__ = 'DataSample'
uid = Column(Integer, primary_key=True)
data = Column(JSONB)
body = {
"address": {
"street1": "123 seasame st"
}
}
datasample = DataSample(data=body)
address = Address(street_1=datasample.data['address']['street_1'],
sample_id=datasample.uid,
data_source=?)

As clarified, the question is seeking a way to flexibly specify a path within a JSON object of a particular record. Keys are being handled in normal columns. Constraints on JSONB fields are not available, and there is no specific support for specifying paths within JSON objects.
I worked with the following in SQL Fiddle using PostgreSQL 9.6:
CREATE TABLE datasample (
id integer PRIMARY KEY,
data jsonb
);
CREATE TABLE address (
id integer PRIMARY KEY,
street_1 text,
sample_id integer REFERENCES datasample (id),
data_source text
);
INSERT INTO datasample(id, data)
VALUES (1, '{"address":{"street_1": "123 seasame st"}}');
INSERT INTO address(id,street_1, sample_id, data_source)
VALUES (1,'123 seasame st',1,'datasample.data->''address''->>''street''');
A typical lookup of the street address (needed to retrieve street_1) would resemble:
SELECT datasample.data->'address'->>'street_1'
FROM datasample
WHERE id=1;
There is no special postgres type for identifying columns. Strings are the closest available and you will need to retrieve the string (or array of strings, or object containing strings, if one of those simplifies parsing) and use it to build the query. In tbe first code block, I stored it as the (escaped) fragment of query - 'datasample.data->''address''->>''street'''. Though longer, it would require only retrieval and unescaping to use in a new custom query. I did not find a way to use the string as a fragment within the same SQL statement, though it might be possible to combine it with other bits of text to form a full statement that could be run through EXECUTE.

Related

Using json_extract in sqlite to pull data from parent and child objects

I'm starting to explore the JSON1 library for sqlite and have been so far successful in the basic queries I've created. I'm now looking to create a more complicated query that pulls data from multiple levels.
Here's the example JSON object I'm starting with (and most of the data is very similar).
{
"height": 140.0,
"id": "cp",
"label": {
"bind": "cp_label"
},
"type": "color_picker",
"user_data": {
"my_property": 2
},
"uuid": "948cb959-74df-4af8-9e9c-c3cb53ac9915",
"value": {
"bind": "cp_color"
},
"width": 200.0
}
This json object is buried about seven levels deep in a json structure and I pulled it from the larger json construct using an sql statement like this:
SELECT value FROM forms, json_tree(forms.formJSON, '$.root')
WHERE type = 'object'
AND json_extract(value, '$.id') = #sControlID
// In this example, #sControlID is a variable that represents the `id` value we're looking for, which is 'cp'
But what I really need to pull from this object are the following:
the value from key type ("color_picker" in this example)
the values from keys bind ("cp_color" and "cp_label" in this example)
the keys value and label (which have values of {"bind":"<string>"} in this example)
For that last item, the key name (value and label in this case) can be any number of keywords, but no matter the keyword, the value will be an object of the form {"bind":"<some_string>"}. Also, there could be multiple keys that have a bind object associated with them, and I'd need to return all of them.
For the first two items, the keywords will always be type and bind.
With the json example above, I'd ideally like to retrieve two rows:
type key value
color_picker value cp_color
color_picker label cp_label
When I use json_extract methods, I end up retrieving the object {"bind":"cp_color"} from the json_tree table, but I also need to retrieve the data from the parent object. I feel like I need to do some kind of union, but my attempts have so far been unsuccessful. Any ideas here?
Note: if the {"bind":"<string>"} object doesn't exist as a child of the parent object, I don't want any rows returned.
Well, I was on the right track and eventually figured out it. I created a separate query for each of the items I was looking for, then INNER JOINed all the json_tree tables from each of the queries to have all the required fields available. Then I json_extracted the required data from each of the json fields I needed data from. In the end, it gave me exactly what I was looking for, though I'm sure it could be written more efficiently.
For anyone interested, this is what hte final query ended up looking like:
SELECT IFNULL(json_extract(parent.value, '$.type'), '_window_'), child.key, json_extract(child.value, '$.bind') FROM (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) parent INNER JOIN (SELECT json_tree.* FROM nui_forms, json_tree(nui_forms.formJSON, '$') WHERE type = 'object' AND json_extract(value, '$.bind') != 'NULL' AND json_extract(nui_forms.formJSON, '$.id') = #sWindowID) child ON child.parent = parent.id;
If you have any tips on reducing its complexity, feel free to comment!

AWS Glue Crawler for JSONB column in PostgreSQL RDS

I've created a crawler that looks at a PostgreSQL 9.6 RDS table with a JSONB column but the crawler identifies the column type as "string". When I then try to create a job that loads data from a JSON file on S3 into the RDS table I get an error.
How can I map a JSON file source to a JSONB target column?
It's not quite a direct copy, but an approach that has worked for me is to define the column on the target table as TEXT. After the Glue job populates the field, I then convert it to JSONB. For example:
alter table postgres_table
alter column column_with_json set data type jsonb using column_with_json::jsonb;
Note the use of the cast for the existing text data. Without that, the alter column would fail.
Crawler will identify JSONB column type as "string" but you can try to use Unbox Class in Glue to convert this column to json
let's check the following table in PostgreSQL
create table persons (id integer, person_data jsonb, creation_date timestamp )
There is an example of one record from person table
ID = 1
PERSON_DATA = {
"firstName": "Sergii",
"age": 99,
"email":"Test#test.com"
}
CREATION_DATE = 2021-04-15 00:18:06
The following code need to be added in Glue
# 1. create dynamic frame from catalog
df_persons = glueContext.create_dynamic_frame.from_catalog(database = "testdb", table_name = "persons", transformation_ctx = "df_persons ")
# 2.in path you need to add your jsonb column name that need to be converted to json
df_persons_json = Unbox.apply(frame = df_persons , path = "person_data", format="json")
# 3. converting from dynamic frame to data frame
datf_persons_json = df_persons_json.toDF()
# 4. after that you can process this column as a json datatype or create dataframe with all necessary columns , each json data element can be added as a separate column in dataframe :
final_df_person = datf_persons_json.select("id","person_data.age","person_data.firstName","creation_date")
You can also check the following link:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-transforms-Unbox.html

Deedle - how to use 'ParseExact' within the Frame.ReadCsv schema

I have a CSV file of data in the form
21.06.2016 23:00:00.349, 153.461, 153.427
21.06.2016 23:00:00.400, 153.460, 153.423
etc
The initial step of creating a frame involves the optional inclusion of a 'schema' to specify or rename column heads and specify types:
let df = Frame.ReadCsv(__SOURCE_DIRECTORY__ + "/data/GBPJPY.csv", hasHeaders=true, inferTypes=false, schema="TS (DateTimeOffset), Bid (float(3)), Ask (float(3))")
I would like to specify the first column of string values to be ParseExact'ed to DateTimeOffset of the format
"dd.mm.yyyy HH:mm:ss.fff"
(I'm assuming the use of the setting System.Globalization.CultureInfo.InvariantCulture).
How do I express the schema such that it will parse the datetime string in that first Frame.ReadCsv("file.csv", schema = ........ )? Or is this not possible to accomplish within the schema statement?

PostgreSQL Auto-increment inside a JSON

Is it possible to auto-increment inside PostgreSQL's new JSON type using just SQL (like serial) and not server code?
I can't really imagine why you'd want to, but sure.
CREATE SEQUENCE whywouldyou_jsoncol_seq;
CREATE TABLE whywouldyou (
jsoncol json not null default json_object(ARRAY['id'], ARRAY[nextval('whywouldyou_jsoncol_seq')::text]),
dummydata text;
);
ALTER SEQUENCE whywouldyou_jsoncol_seq OWNED BY whywouldyou.jsoncol;
insert into whywouldyou(dummydata) values('');
select * from whywouldyou;
jsoncol | dummydata
--------------+-----------
{"id" : "1"} |
(1 row)
Note that with this particular formulation it's the string "1" not the number 1 in the json. You might want to form the json object another way if you want to avoid that. This is just an example.

Encoding a binary tree to json

I'm using the sqlalchemy to store a binary tree data in the db:
class Distributor(Base):
__tablename__ = "distributors"
id = Column(Integer, primary_key=True)
upline_id = Column(Integer, ForeignKey('distributors.id'))
left_id = Column(Integer, ForeignKey('distributors.id'))
right_id = Column(Integer, ForeignKey('distributors.id'))
how can I generate json "tree" format data like the above listed:
{'id':1,children:[{'id':2, children:[{'id':3, 'id':4}]}]}
I'm guessing you're asking to store the data in a JSON format? Or are you trying to construct JSON from the standard relational data?
If the former, why don't you just create entries like:
{id: XX, parentId: XX, left: XX, right: XX, value: "foo"}
For each of the nodes, and then reconstruct the tree manually from the entries? Just start form the head (parentId == null) and then assemble the branches.
You could also add an additional identifier for the tree itself, in case you have multiple trees in the database. Then you would just query where the treeId was XXX, and then construct the tree from the entries.
I hesitate to provide this answer, because I'm not sure I really understand your the problem you're trying to solve (A binary tree, JSON, sqlalchemy, none of these are problems).
What you can do with this kind of structure is to iterate over each row, adding edges as you go along. You'll start with what is basically a cache of objects; which will eventually become the tree you need.
import collections
idmap = collections.defaultdict(dict)
for distributor in session.query(Distributor):
dist_dict = idmap[distributor.id]
dist_dict['id'] = distributor.id
dist_dict.setdefault('children', [])
if distributor.left_id:
dist_dict.['children'].append(idmap[distributor.left_id])
if distributor.right_id:
dist_dict.['children'].append(idmap[distributor.right_id])
So we've got a big collection of linked up dicts that can represent the tree. We don't know which one is the root, though;
root_dist = session.query(Distributor).filter(Distributor.upline_id == None).one()
json_data = json.dumps(idmap[root_dist.id])