PostgreSQL JSON quick search (search value in from any key) - json

I try to find solution for quick search functionality within PostgreSQL JSONB column. Requirements is that we can search for value in any JSON key.
Table structure:
CREATE TABLE entity (
id bigint NOT NULL,
jtype character varying(64) NOT NULL,
jdata jsonb,
CONSTRAINT entity_pk PRIMARY KEY (id) )
Idea is that we store different type jsons in one table, jtype define json entity type, jdata - json data, for example:
jtype='person',jvalue = '{"personName":"John", "personSurname":"Smith", "company":"ABS Software", "position":"Programmer"}'
jtype='company', jvalue='{"name":"ABS Software", "address":"Somewhere in Alaska"}'
Goal is to make quick search that user can type 'ABS' and find both records - company and person who works in company.
Analog for Oracle DB is function CONTAINS:
SELECT jtype, jvalue FROM entity WHERE CONTAINS (jvalue, 'ABS') > 0;
GIN index only allow for searching key/value pairs
GIN indexes can be used to efficiently search for keys or key/value
pairs occurring within a large number of jsonb documents (datums). Two
GIN "operator classes" are provided, offering different performance
and flexibility trade-offs.
https://www.postgresql.org/docs/current/static/datatype-json.html#JSON-INDEXING

https://github.com/postgrespro/jsquery might be useful for what you are looking for although I haven't used it before.

As of Postgresql 10, you can create indexes on JSON/JSONB columns and then do full text searching within the values for that column as such:
libdata=# SELECT bookdata -> 'title'
FROM bookdata
WHERE to_tsvector('english',bookdata) ## to_tsquery('duke');
------------------------------------------
"The Tattooed Duke"
"She Tempts the Duke"
"The Duke Is Mine"
"What I Did For a Duke"
More documentation can be found here.

Related

How do I extract values from a JSON array in MariaDB or MySQL?

Situation
I have a table in a MariaDB database. This table has a LONGTEXT column which is used to store a JSON array (read more about this topic in MariaDB JSON Data Type).
Question
I would like to extract values from the JSON array, based on a certain key. How do I achieve this with MariaDB (or MySQL)?
Example
Here's the simplified table thing (just for demo purposes):
id
thing_name
examples
0
fruit
[{"color": "green","title": "Apple"},{"color": "orange","title": "Orange"},{"color": "yellow","title": "Banana"}]
1
car
[{"color": "silver","title": "VW"},{"color": "black","title": "Bentley"},{"color": "blue","title": "Tesla"}]
My goal is to extract all title values from the JSON array.
You can use JSON_EXTRACT for this task (works for both MariaDB and MySQL). This function also supports wildcards, as described in the docs:
Paths can contain * or ** wildcards
Depending on whether you have multiple levels of data (e.g. single document vs array), either a single or double asterisk wildcard should be used:
JSON_EXTRACT(json,'$**.key')
json is a valid JSON document (e.g. a column), key is the lookup key used.
For your example
In order to find all title values in your JSON array, use the following query:
SELECT id, thing_name, JSON_EXTRACT(examples, '$**.title') as examples_titles FROM thing
id
thing_name
examples_titles
0
fruit
["Apple", "Orange", "Banana"]
1
car
["VW", "Bentley", "Tesla"]

Create column with RECORD type in BigQuery

I want to create column with type RECORD
I have a STRUCT OR ARRAY(STRUCT)
json
--------
"fruit":[{"apples":"5","oranges":"10"},{"apples":"5","oranges":"4"}]
"fruit":{"apples":"1","oranges":"15"}
"fruit":{"apples":"5","oranges":"1"}
I want to create fruit of RECORD type
fruit RECORD NULLABLE
fruit.apples STRING NULLABLE
fruit.oranges STRING NULLABLE
Using bigquery SQL you can use the following DDL as described in the documentation https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_table_statement
CREATE TABLE mydataset.newtable
(
fruit STRUCT<
apples STRING,
oranges STRING
>
)
You can also use BQ auto-detect feature to create table from a JSON file https://cloud.google.com/bigquery/docs/schema-detect#loading_data_using_schema_auto-detection
I believe the most straightforward way to achieve what you want to do is by using an edited version of the json file you have provided (complying to the rules shown in the Public Docs) and loading your data with auto-detection from the Cloud Console.
If you would like to get the following schema:
fruit RECORD NULLABLE
fruit.apples INTEGER NULLABLE
fruit.oranges INTEGER NULLABLE
You should use the following json file:
{"fruit":{"apples":"5","oranges":"10"}}
{"fruit":{"apples":"5","oranges":"4"}}
{"fruit":{"apples":"1","oranges":"15"}}
{"fruit":{"apples":"5","oranges":"1"}}
On the other hand, if you prefer to get a repeated attribute (since there are two fruit objects in the same row of the example you provided), you would need to use the following file:
{"fruit":[{"apples":"5","oranges":"10"},{"apples":"5","oranges":"4"}]}
{"fruit":{"apples":"1","oranges":"15"}}
{"fruit":{"apples":"5","oranges":"1"}}
This will result in the following schema:
fruit RECORD REPEATED
fruit.apples INTEGER NULLABLE
fruit.oranges INTEGER NULLABLE
Finally, I have noticed that you have specified in the question that you would like instead to get the attributes fruit.apples and fruit.oranges as STRING (which is not straightforward for the auto-detection since the values are numbers such as 5 and 10). In this case you could explicitly create the table with a DDL statement, but I strongly suggest considering turning these fields into a integer if that would still suit your use case scenario.

Database design - using JSON to store info

I have a database listing campsites
There is a main table tblSites that contains the unique data eg name, coordinates, address etc and also includes columns for each facility eg, Toilet, Water, Shower, Electric etc where these are just 1=Yes, Null= no
This would be searched by something like
SELECT id FROM tblSites WHERE Water = 1 AND Toilets = 1
There is another related table tblLocations which contains location type (ie Near the sea, Rural, Mountains, By a river etc.
This means the table has a lot of columns and doesn't allow for easy updating if I want to add a new category.
This would be included in a search like this
SELECT M.id, L.* FROM tblSites AS M
LEFT JOIN tblLocation AS L ON M.ID = L.ID WHERE M.water=1 AND L.river=1
What I am considering is adding a column eg facilities that would contain an json string of facilities as a numbered key eg [1,3,4,12] each of the numbers represents an available facility, and another column for locations in the same format eg [1,3,5]
THis does allow me to reduce the table size and add additional facilities or locations without adding extra columns but is it a good idea performance wise?
i.e. a search would now be something like
SELECT id FROM tblSites WHERE (facilities LIKE '%1,%' AND facilities LIKE '%4,%' AND locations LIKE '%1,%')
Is there a better query that could be used to see if the field contains a key number in the array string?
Your WHERE clause is not working fine while using like '%1,%'.
if your facilities is string (TEXT or varchar ...) and searching value in stringed json array [2,3,12,21,300], where facilities like '%1,%' is true and matching with '21,' and if you want find where facilities like '%300,%' it never match that mentioned array!
So, searching json array in string format is rejected.
if your MySQL version is greater than 5.7.8, it supports native json as JSON type for columns.
When you store your data in json column in MySQL (by JSON_INSERT()) you're able to search them in where by WHERE JSON_CONTAINS(facilities,1)
But the best solution is re-design your table structures and relations as #Robby commented below your question.

Query spark on JSON object stored on Cassandra DB

I built the structure on cassandra DB to store the time series data of the OS data like services, process and other information. To understand how to works Cassandra about storing JSON data and retrieval the data by CQL queries with condition I prefered to simplify the model. Because in the total model DB I'll have the TYPE more complex than report_object like hashMap of array of hashMap for example:
Type NETSTAT--> Object[n] --> {host:192.168.0.23, protocol: TCP ,LocalAddress : 0.0.0.0}
so the Type NETSTAT will have a list of hashMaps that will contain the fields key -> value.
For simplify I have choosen to show the following schema:
CREATE TYPE report_object (RTIME varchar, RMINORVER int, RUSER varchar, RLANG varchar, RSCRIPT varchar, RMAJORVER int, RHOST varchar, RPATH varchar);
CREATE TABLE test (
REPORTUUID uuid PRIMARY KEY,
report frozen<report_object>);
Inside the table I injectioned the JSON data with the followed query inside java class:
INSERT INTO test JSON '{"REPORTUUID": "9fb21fb9-333e-4017-ab77-0fa6ee1e20e3" ,"REPORT":{"RTIME":"6/MAR/2016 6:0:0 PM","RMINORVER":0,"RUSER":"Administrator","RLANG":"vbs","RSCRIPT":"Main","RMAJORVER":5,"RHOST":"WIN-SAPV9MUEMNS","RPATH":"C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\IXP000.TMP"}}';
I inectioned other data with the query above.
The questions to clarify my concepts are:
- I would like to do the queries with conditions that check inside TYPE defined, is it possible with CQL or is necessary to use spark SQL?
Is design DB model right for the purpose (Because I have passed from RDBMS to DB NoSQL) ?
To be able to query User Defined Type using Cassandra you'll have to create an index first:
CREATE INDEX on test.test(report);
but it allows only a predicate based on a full document:
SELECT * FROM test
WHERE report=fromJson('{"RTIME":"6/MAR/2016 6:0:0 PM","RMINORVER":0,"RUSER":"Administrator","RLANG":"vbs","RSCRIPT":"Main","RMAJORVER":5,"RHOST":"WIN-SAPV9MUEMNS","RPATH":"C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\IXP000.TMP"}');
You'll find more details and explanation in how to filter cassandra query by a field in user defined type
When exposed using Spark these values can be filtered using filter on CassandraTableScanRDD:
val rdd = sc.cassandraTable("test", "test")
rdd.filter(row =>
row.getUDTValue("report").getString("rscript") == "Main")
or where / filter on a DataFrame:
df.where($"report.rscript" === "Main")
Although query like this using Spark a whole table has to be fetched before data can be filtered. While it is not clear what exactly you are trying to achieve but it is rather unlikely this will be an useful structure in general.

How to query on a JSON-type field with SQLAlchemy?

This is just a simple example.
field = [[1,12,6], [2,12,8]]
I think about storing this in a json-typed field with SQLAlchemy in sqlite3 database.
But how can I query on this with SQLAlchemy (not SQL)?
e.g.
rows where len(field) is 3
rows where field[1][2] is not 12
rows where len(field[*]) is not 3
group by field[2]
sum field[1] where field[0] is 1
Is it even possible?
Maybe it would be easier to implent this with a classical many-to-many table (incl. a link-table) instead of json-field?
According to the official documentation, these are the available methods SQLAlchemy provides for JSON Fields.
JSON provides several operations:
Index operations:
data_table.c.data['some key']
Index operations returning text (required for text comparison):
data_table.c.data['some key'].astext == 'some value'
Index operations with a built-in CAST call:
data_table.c.data['some key'].cast(Integer) == 5
Path index operations:
data_table.c.data[('key_1', 'key_2', ..., 'key_n')]
Path index operations returning text (required for text comparison):
data_table.c.data[('key_1', 'key_2', ..., 'key_n')].astext == \
'some value'
http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html#sqlalchemy.dialects.postgresql.JSON
From what I know about the JSON type in PostgreSQL, it's best used only if you would want the whole JSON object. If you want to make SQL like operations on fields of the JSON object, then it's best to use classic SQL relations. Here's one source I found saying the same thing, but there are many more. http://blog.2ndquadrant.com/postgresql-anti-patterns-unnecessary-jsonhstore-dynamic-columns/