I am testing Postgresql 9.4 beta2 right now. I am wondering if it is possible to create a unique index on embedded json object?
I create a table name products:
CREATE TABLE products (oid serial primary key, data jsonb)
Now, I try to insert json object into data column.
{
"id": "12345",
"bags": [
{
"sku": "abc123",
"price": 0,
},
{
"sku": "abc123",
"price": 0,
}
]
}
However, I want sku of bags to be unique. It means the json can't be inserted into products tables because sku is not unique in this case.
I tried to create a unique index like below, but it failed.
CREATE UNIQUE INDEX product_sku_index ON products( (data->'bags'->'sku') )
Any suggestions?
Your attempt to create a UNIQUE INDEX on the expression was bound to fail for multiple reasons.
CREATE UNIQUE INDEX product_sku_index ON products( (data->'bags'->'sku') )
The first and most trivial being that ...
data->'bags'->'sku'
does not reference anything. You could reference the first element of the array with
data->'bags'->0->>'sku'
or shorter:
data#>>'{bags,0,sku}'
But that expression only returns the first value of the array.
Your definition: "I want sku of bags to be unique" .. is unclear. Do you want the value of sku to be unique? Within one JSON object or among all json objects in the column data? Or do you want to restrict the array to a single element with an sku?
Either way, neither of these goals can be implemented with a simple UNIQUE index.
Possible solution
If you want sku values to be unique across all json arrays in data->'bags', there is a way. Unnest the array and write all individual sku values to separate rows in a simple auxiliary table with a unique (or PK) constraint:
CREATE TABLE prod_sku(sku text PRIMARY KEY); -- PK enforces uniqueness
This table may be useful for additional purposes.
Here is a complete code example for a very similar problem with plain Postgres arrays:
Can PostgreSQL have a uniqueness constraint on array elements?
Only adapt the unnesting technique. Instead of:
DELETE FROM hostname h
USING unnest(OLD.hostnames) d(x)
WHERE h.hostname = d.x;
...
INSERT INTO hostname(hostname)
SELECT h
FROM unnest(NEW.hostnames) h;
Use:
DELETE FROM prod_sku p
USING jsonb_array_elements(NEW.data->'bags') d(x)
WHERE p.sku = d.x->>'sku';
...
INSERT INTO prod_sku(sku)
SELECT b->>'sku'
FROM jsonb_array_elements(NEW.data->'bags') b
Details for that:
PostgreSQL joining using JSONB
Related
I want to make a where clause on MySql. I have a column in DB that have a list of ids separated by comma (2,5,6,8). And I need to test if each of those ids is in another list of ids. If I had just one id to test (not a list), I know how to do with a “where IN” clause. So, how to test if a list of ids exist in another list of ids?
There's no such thing as a "list of ids" in SQL. What you have is a string, which happens to contain numeric digits and commas.
Storing a list of id's as a string is okay, if you only need to use it like a string. But if you need to do queries that treat the list as a set of discrete id values, you should normalize the data by storing one id per row in a dependent table.
CREATE TABLE IdSets (
entity_id INT NOT NULL,
item_id INT NOT NULL,
PRIMARY KEY (entity_id, item_id),
FOREIGN KEY (entity_id) REFERENCES entities(entity_id)
);
Then you can solve your problem with a JOIN query.
SELECT i.entity_id
FROM IdSets AS i LEFT OUTER JOIN TestSets AS t USING (item_id)
GROUP BY entity_id
HAVING COUNT(i.item_id) = COUNT(t.item_id)
Thanks for all the tips, I thought that was a ready function for that. I am new on SQL. But as it is not possible to make the list test directly, I will use another table that storage both the ids separatelly to make the test.
I have the following nested types defined in postgres:
CREATE TYPE address AS (
name text,
street text,
zip text,
city text,
country text
);
CREATE TYPE customer AS (
customer_number text,
created timestamp WITH TIME ZONE,
default_billing_address address,
default_shipping_address address
);
And would now like to populate this types in a stored procedure, which gets json as an input parameter. This works for fields on the top-level, the output shows me the internal format of a postgres composite type:
# select json_populate_record(null::customer, '{"customer_number":"12345678"}'::json)::customer;
json_populate_record
----------------------
(12345678,,,)
(1 row)
However, postgres does not handle a nested json structure:
# select json_populate_record(null::customer, '{"customer_number":"12345678","default_shipping_address":{"name":"","street":"","zip":"12345","city":"Berlin","country":"DE"}}'::json)::customer;
ERROR: malformed record literal: "{"name":"","street":"","zip":"12345","city":"Berlin","country":"DE"}"
DETAIL: Missing left parenthesis.
What works again is, if the nested property is in postgres' internal format like here:
# select json_populate_record(null::customer, '{"customer_number":"12345678","default_shipping_address":"(\"\",\"\",12345,Berlin,DE)"}'::json)::customer;
json_populate_record
--------------------------------------------
(12345678,,,"("""","""",12345,Berlin,DE)")
(1 row)
Is there any way to get postgres to convert from a nested json structure to a corresponding composite type?
Use json_populate_record() only for nested objects:
with a_table(jdata) as (
values
('{
"customer_number":"12345678",
"default_shipping_address":{
"name":"",
"street":"",
"zip":"12345",
"city":"Berlin",
"country":"DE"
}
}'::json)
)
select (
jdata->>'customer_number',
jdata->>'created',
json_populate_record(null::address, jdata->'default_billing_address'),
json_populate_record(null::address, jdata->'default_shipping_address')
)::customer
from a_table;
row
--------------------------------------------
(12345678,,,"("""","""",12345,Berlin,DE)")
(1 row)
Nested composite types are not what Postgres (and any RDBMS) was designed for. They are too complicated and troublesome.
In the database logic nested structures should be maintained as related tables, e.g.
create table addresses (
address_id serial primary key,
name text,
street text,
zip text,
city text,
country text
);
create table customers (
customer_id serial primary key, -- not necessary `serial` may be `integer` or `bigint`
customer_number text, -- maybe redundant
created timestamp with time zone,
default_billing_address int references adresses(address_id),
default_shipping_address int references adresses(address_id)
);
Sometimes it is reasonable to have nested structure in a table but it seems more convenient and natural to use jsonb or hstore in these cases, e.g.:
create table customers (
customer_id serial primary key,
customer_number text,
created timestamp with time zone,
default_billing_address jsonb,
default_shipping_address jsonb
);
plpython to the rescue:
create function to_customer (object json)
returns customer
AS $$
import json
return json.loads(object)
$$ language plpythonu;
Example:
select to_customer('{
"customer_number":"12345678",
"default_shipping_address":
{
"name":"",
"street":"",
"zip":"12345",
"city":"Berlin",
"country":"DE"
},
"default_billing_address":null,
"created": null
}'::json);
to_customer
--------------------------------------------
(12345678,,,"("""","""",12345,Berlin,DE)")
(1 row)
Warning: postgresql when building returned object from python requires to have all null values present as None (ie. it's not allowed to skip null values as not present), thus we have to specify all null values in incoming json. For example, not allowed:
select to_customer('{
"customer_number":"12345678",
"default_shipping_address":
{
"name":"",
"street":"",
"zip":"12345",
"city":"Berlin",
"country":"DE"
}
}'::json);
ERROR: key "created" not found in mapping
HINT: To return null in a column, add the value None to the mapping with the key named after the column.
CONTEXT: while creating return value
PL/Python function "to_customer"
This seems to be solved in Postgres 10. Searching the release notes for json_populate_record shows the following change:
Make json_populate_record() and related functions process JSON arrays and objects recursively (Nikita Glukhov)
With this change, array-type fields in the destination SQL type are properly converted from JSON arrays, and composite-type fields are properly converted from JSON objects. Previously, such cases would fail because the text representation of the JSON value would be fed to array_in() or record_in(), and its syntax would not match what those input functions expect.
I get an input in JSON format, and that value should be stored in MySQL database. I want to store more than 1 image URL in a single row. Is it possible, and how?
Below is the input in JSON format. I want to store multiple image data in single row?
{
"session_id": "192urjh91f",
"description": "description of the post",
"location": "12.00847,-71.297489",
"place_id": "917439",
"images": [
"url1",
"url2",
"url3"
],
"audio": "audio url",
"tags": [
"1234",
"31332",
"12412"
],
"people": [
"user_id",
"user_id1",
"user_id2"
]
}
If you know that there are always going to be three images the you could create three columns image1..3. If # of images is at runtime, then it is always better to have a separate table for images. If you have restrictions of the tables design then you can store into a single column with some delimiter.
The clean way to store this sort of data in a relational database involves making one table for all the scalar (string or numeric) members of your object and an additional table for each array-valued member. You have three arrays (images, tags, and people), which means you'll need to create four tables with the following columns:
posts (post_id PRIMARY KEY, session_id, description, latitude, longitude, place_id, audio_uri)
post_images (post_id, uri, PRIMARY KEY (post_id, uri))
post_tags (post_id, tag_id, PRIMARY KEY (post_id, tag_id))
post_people (post_id, user_id, PRIMARY KEY (post_id, user_id))
(Translating these into actual CREATE TABLE statements is left as an exercise for someone who knows the acceptable values for each member.)
When you add a post, you add one row for the post itself, which should get its post_id through the AUTO_INCREMENT mechanism. Then take this post_id and use it to insert a row for each image, a row for each tag, and a row for each user.
We have a special kind of table in our DB that stores the history of its changes in itself. So called "self-archived" table:
CREAT TABLE coverages (
id INT, # primary key, auto-increment
subscriber_id INT,
current CHAR, # - could be "C" or "H".
record_version INT,
# etc.
);
It stores "coverages" of our subscribers. Field "current" indicates if this is a current/original record ("C") or history record ("H").
We could only have one current "C" coverage for the given subscriber, but we can't create a unique index with 2 fields (*subscriber_id and current*) because for any given "C" record there could be any number of "H" records - history of changes.
So the index should only be unique for current == 'C' and any subscriber_id.
That could be done in Oracle DB using something like "materialized views": where we could create a materialized view that would only include records with current = 'C' and create a unique index with these 2 fields: *subscriber_id, current*.
The question is: how can this be done in MySQL?
You can do this using NULL values. If you use NULL instead of "H", MySQL will ignore the row when evaluating the UNIQUE constraint:
A UNIQUE index creates a constraint such that all values in the index must be
distinct. An error occurs if you try to add a new row with a key value that
matches an existing row. This constraint does not apply to NULL values except
for the BDB storage engine. For other engines, a UNIQUE index permits multiple
NULL values for columns that can contain NULL.
Now, this is cheating a bit, and it means that you can't have your data exactly as you want it. So this solution may not fit your needs. But if you can rework your data in this way, it should work.
Consider a 500 million row MySQL table with the following table structure ...
CREATE TABLE foo_objects (
id int NOT NULL AUTO_INCREMENT,
foo_string varchar(32),
metadata_string varchar(128),
lookup_id int,
PRIMARY KEY (id),
UNIQUE KEY (foo_string),
KEY (lookup_id),
);
... which is being queried using only the following two queries ...
# lookup by unique string key, maximum of one row returned
SELECT * FROM foo_objects WHERE foo_string = ?;
# lookup by numeric lookup key, may return multiple rows
SELECT * FROM foo_objects WHERE lookup_id = ?;
Given those queries, how would you represent the given data-set using Cassandra?
you have two options:
(1) is sort of traditional: have one CF (columnfamily) with your foo objects, one row per foo, one column per field. then create two index CFs, where the row key in one is the string values, and the row key in the other is lookup_id. Columns in the index rows are foo ids. So you do a GET on the index CF, then a MULTIGET on the ids returned.
Note that if you can make id the same as lookup_id then you have one less index to maintain.
High-level clients like Digg's lazyboy (http://github.com/digg/lazyboy) will automate maintaining the index CFs for you. Cassandra itself does not do this automatically (yet).
(2) is like (1), but you duplicate the entire foo objects into subcolumns of the index rows (that is, the index top-level columns are supercolumns). If you're not actually querying by the foo id itself, you don't need to store it in its own CF at all.