I have a few tables with json columns and would like to build an index on these columns. However, I get a default operator class (http://www.postgresql.org/docs/9.2/static/sql-createopclass.html). Has someone already done that or give me an alternative?
To reproduce the problem, try:
>> create table foo (json json, id int);
CREATE TABLE
>> create index on foo (id);
CREATE INDEX
>> create index on foo (json);
ERROR: data type json has no default operator class for access method "btree"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
The same applies for gist or gil indexes.
Interestingly, I don't get the error when I try the following:
>> create type "nested" as (json json, extra text);
CREATE TYPE
>> create table bar (id int, json nested);
CREATE TABLE
>> create index on bar (json);
CREATE INDEX
Is that because no index is created for the components?
Okay, the main issue is the default operator. Any help or shared experience with that is appreciated.
Thanks.
There is a workaround for this if you feel like installing the PLV8 JavaScript language:
http://people.planetpostgresql.org/andrew/index.php?/archives/249-Using-PLV8-to-index-JSON.html
The solution that works best for my case is the following:
I simply treat json as text and create an index on the text.
>> create index on foo ((json::text));
A query would then have to be transformed so that it uses the index.
Explain reveals whether the index is used or not.
>> explain select * from foo where json::text = 'foo';
there are no internal index type for JSON or XML type. This fields can hold a values, bat it cannot be indexes - you need to auxiliary columns with hstore columns or similar.
Related
Hi i am trying to convert a column in my table from varchar to json and the table already had some string data. I tried doing that with the below command.
Database=# alter table table_name alter column message type json using
message::json;
But the command failed with the below error.
ERROR: invalid input syntax for type json
DETAIL: Token "This" is invalid.
CONTEXT: JSON data, line 1: This...
Note : The message column has a set of words with spaces like below.
"This is a message"
I am not sure what went wrong. Thanks in advance..
You can use to_jsonb() rather than casting:
alter table table_name
alter column message type jsonb using to_jsonb(message);
If you really want to use json (although jsonb is recommended), then cast the result back to a json type:
alter table table_name
alter column message type json using to_jsonb(message)::json;
But this seems rather strange for a column that doesn't contain "real" json values, only plain strings.
In my case, I wanna split varchar that have separator each n-character. For example
varchar "ab,cd,ef,gh" -> json ["ab","cd","ef","gh"]
First that I do, convert varchar into array with delimiter of comma (",")
ALTER table my_table ALTER column my_column TYPE text[] USING string_to_array(my_column,',')
Then, convert array of varchar into json (maybe will you use for GraphQL database)
ALTER table my_table ALTER column my_column TYPE json USING array_to_json(my_column)
That gives me a result of json(array) like ["ab","cd","ef","gh"]
You can simply run the following command below, it will surely convert the field type as well as the previous records, you might also need to update your models file of respective framework so you can work accordingly.
In my case, I was converting saves_states table's column named prev_month_access_counts type from text to json.
ALTER TABLE saved_states ALTER COLUMN prev_months_access_counts TYPE jsonb
using to_jsonb(prev_months_access_counts);
I built the structure on cassandra DB to store the time series data of the OS data like services, process and other information. To understand how to works Cassandra about storing JSON data and retrieval the data by CQL queries with condition I prefered to simplify the model. Because in the total model DB I'll have the TYPE more complex than report_object like hashMap of array of hashMap for example:
Type NETSTAT--> Object[n] --> {host:192.168.0.23, protocol: TCP ,LocalAddress : 0.0.0.0}
so the Type NETSTAT will have a list of hashMaps that will contain the fields key -> value.
For simplify I have choosen to show the following schema:
CREATE TYPE report_object (RTIME varchar, RMINORVER int, RUSER varchar, RLANG varchar, RSCRIPT varchar, RMAJORVER int, RHOST varchar, RPATH varchar);
CREATE TABLE test (
REPORTUUID uuid PRIMARY KEY,
report frozen<report_object>);
Inside the table I injectioned the JSON data with the followed query inside java class:
INSERT INTO test JSON '{"REPORTUUID": "9fb21fb9-333e-4017-ab77-0fa6ee1e20e3" ,"REPORT":{"RTIME":"6/MAR/2016 6:0:0 PM","RMINORVER":0,"RUSER":"Administrator","RLANG":"vbs","RSCRIPT":"Main","RMAJORVER":5,"RHOST":"WIN-SAPV9MUEMNS","RPATH":"C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\IXP000.TMP"}}';
I inectioned other data with the query above.
The questions to clarify my concepts are:
- I would like to do the queries with conditions that check inside TYPE defined, is it possible with CQL or is necessary to use spark SQL?
Is design DB model right for the purpose (Because I have passed from RDBMS to DB NoSQL) ?
To be able to query User Defined Type using Cassandra you'll have to create an index first:
CREATE INDEX on test.test(report);
but it allows only a predicate based on a full document:
SELECT * FROM test
WHERE report=fromJson('{"RTIME":"6/MAR/2016 6:0:0 PM","RMINORVER":0,"RUSER":"Administrator","RLANG":"vbs","RSCRIPT":"Main","RMAJORVER":5,"RHOST":"WIN-SAPV9MUEMNS","RPATH":"C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\IXP000.TMP"}');
You'll find more details and explanation in how to filter cassandra query by a field in user defined type
When exposed using Spark these values can be filtered using filter on CassandraTableScanRDD:
val rdd = sc.cassandraTable("test", "test")
rdd.filter(row =>
row.getUDTValue("report").getString("rscript") == "Main")
or where / filter on a DataFrame:
df.where($"report.rscript" === "Main")
Although query like this using Spark a whole table has to be fetched before data can be filtered. While it is not clear what exactly you are trying to achieve but it is rather unlikely this will be an useful structure in general.
I've read the docs and it appears that there's no discernible way to perform an ALTER TABLE ... ALTER COLUMN ... USING statement to directly convert a json type column to an hstore type. There's no function available (that I'm aware of) to perform the cast.
The next best alternative I have is to create a new column of type hstore, copy my JSON data to that new column using some external tool, drop the old json column and rename the new hstore column to the old column's name.
Is there a better way?
What I have so far is:
$ CREATE TABLE blah (unstructured_data JSON);
$ ALTER TABLE blah ALTER COLUMN unstructured_data
TYPE hstore USING CAST(unstructured_data AS hstore);
ERROR: cannot cast type json to hstore
Unfortunately, PostgreSQL doesn't allow all kind of expressions within the USING clause of ALTER TABLE ... SET DATA TYPE ... (f.ex. sub-queries are disallowed).
But, you can write a function to overcome this, you just need to decide what to do with advanced types (in object's values), like arrays & objects. Here is an example, which simply converts them to string:
CREATE OR REPLACE FUNCTION my_json_to_hstore(json)
RETURNS hstore
IMMUTABLE
STRICT
LANGUAGE sql
AS $func$
SELECT hstore(array_agg(key), array_agg(value))
FROM json_each_text($1)
$func$;
After that, you can use this in your ALTER TABLE, like:
ALTER TABLE blah
ALTER COLUMN unstructured_data
SET DATA TYPE hstore USING my_json_to_hstore(unstructured_data);
There is "trap" for repeated keys - allowed by both json and hstore input, but unfortunately resolved differently (!). Consider this example value:
json '{"double_key":"key1","foo":null,"double_key":"key2"}'
In json, 'double_key is effectively 'key2'. The manual:
Because the json type stores an exact copy of the input text, it will
preserve semantically-insignificant white space between tokens, as
well as the order of keys within JSON objects. Also, if a JSON object
within the value contains the same key more than once, all the
key/value pairs are kept. (The processing functions consider the last value as the operative one.)
Bold emphasis mine.
In hstore, however, for the same order of key/value pairs, 'double_key' might effectively be 'key1'. The manual:
Each key in an hstore is unique. If you declare an hstore with
duplicate keys, only one will be stored in the hstore and there is no guarantee as to which will be kept:
Typically, the first instance of a key, but that's an implementation details that might change.
A simple and fast option to always preserve the effective, operative value: cast to jsonb before the conversion. The manual again:
[...] jsonb does not preserve white space, does not preserve
the order of object keys, and does not keep duplicate object keys.
If duplicate keys are specified in the input, only the last value is kept.
Modifying #pozs's conversion function:
CREATE OR REPLACE FUNCTION json2hstore(json)
RETURNS hstore AS
$func$
SELECT hstore(array_agg(key), array_agg(value))
FROM jsonb_each_text($1::jsonb) -- !
$func$ LANGUAGE sql IMMUTABLE STRICT;
Requires Postgres 9.4 or later. Postgres 9.3 has the json type, but not jsonb, yet. A no-op in PL/v8 might be alternative there, like #jpmc mentioned.
I'd like to know if it's possible to create a table with multidimentional integer array,
I tried this syntax, but it didn't work for me unfortuantly:
create table testarray(testarr INT(20)(10));
so what to do in this case? thanks.
You can't. You could do something like convert your array to a comma-separated string, and store this. Use serialize function and store it into the varchar field
No. But you could create a test array table and link to it.
CREATE TABLE testarray(ID NT, RowId INT, Element_0 INT, Element_1 INT..Element_9 INT);
Not you reference the array with testarray.ID as a foreign key from wherever you wanted to have the array original.
That's arguable better then using a comma-separated.
Or stick a json syntax array in a varchar field.
In PostgreSQL 9.3 Beta 2 (?), how do I create an index on a JSON field? I tried it using the -> operator used for hstore but got the following error:
CREATE TABLE publishers(id INT, info JSON);
CREATE INDEX ON publishers((info->'name'));
ERROR: data type json has no default operator class for access method
"btree" HINT: You must specify an operator class for the index or
define a default operator class for the data type.
Found:
CREATE TABLE publishers(id INT, info JSON);
CREATE INDEX ON publishers((info->>'name'));
As stated in the comments, the subtle difference here is ->> instead of ->. The former one returns the value as text, the latter as a JSON object.