mysql xml format for deducing key value pair - mysql

I am trying to record a single string of xml in a mysql database as a table of key value pairs. I am trying to pass the following xml:
"<object><key>x</key><val>y</val><key>key2</key><val>value2</val></object>"
as a varchar to a stored procedure to be converted into a table of key value pairs like so:
ID key val
0 x y
1 key2 value2
but I am having trouble with extract value taking the values I grab from the xml and putting them into a single column result:
SELECT ExtractValue('<object><key>x</key><val>y</val><key>key2</key><val>value2</val></object>','/object/key');
SELECT ExtractValue('<object><key>x</key><val>y</val><key>key2</key><val>value2</val></object>','/object/val');
Thanks in advance for your help!

Related

Filter objects inside array in MySQL JSON column [duplicate]

MySQL 5.7.24
Lets say I have 3 rows like this:
ID (PK) | Name (VARCHAR) | Data (JSON)
--------+----------------+-------------------------------------
1 | Admad | [{"label":"Color", "value":"Red"}, {"label":"Age", "value":40}]
2 | Saleem | [{"label":"Color", "value":"Green"}, {"label":"Age", "value":37}, {"label":"Hoby", "value":"Chess"}]
3 | Daniel | [{"label":"Food", "value":"Grape"}, {"label":"Age", "value":47}, {"label":"State", "value":"Sel"}]
Rule #1: The JSON column is dynamic. Means not everybody will have the same structure
Rule #2: Assuming I can't modify the data structure
My question, it it possible to query so that I can get the ID of records where the Age is >= 40? In this case 1 & 3.
Additional Info (after being pointed as duplicate): if you look at my data, the parent container is array. If I store my data like
{"Age":"40", "Color":"Red"}
then I can simply use
Data->>'$.Age' >= 40
My current thinking is to use a stored procedure to loop the array but I hope I don't have to take that route. The second option is to use regex (which I also hope not). If you think "JSON search" is the solution, kindly point to me which one (or some sample for this noob of me). The documentation's too general for my specific needs.
Here's a demo:
mysql> create table letsayi (id int primary key, name varchar(255), data json);
mysql> > insert into letsayi values
-> (1, 'Admad', '[{"label":"Color", "value":"Red"}, {"label":"Age", "value":"40"}]'),
-> (2, 'Saleem', '[{"label":"Color", "value":"Green"}, {"label":"Age", "value":"37"}, {"label":"Hoby", "value":"Chess"}]');
mysql> select id, name from letsayi
where json_contains(data, '{"label":"Age","value":"40"}');
+----+-------+
| id | name |
+----+-------+
| 1 | Admad |
+----+-------+
I have to say this is the least efficient way you could store your data. There's no way to use an index to search for your data, even if you use indexes on generated columns. You're not even storing the integer "40" as an integer — you're storing the numbers as strings, which makes them take more space.
Using JSON in MySQL when you don't need to is a bad idea.
Is it still possible to query age >= 40?
Not using JSON_CONTAINS(). That function is not like an inequality condition in a WHERE clause. It only matches exact equality of a subdocument.
To do an inequality, you'd have to upgrade to MySQL 8.0 and use JSON_TABLE(). I answered another question recently about that: MySQL nested JSON column search and extract sub JSON
In other words, you have to convert your JSON into a format as if you had stored it in traditional rows and columns. But you have to do this every time you query your data.
If you need to use conditions in the WHERE clause, you're better off not using JSON. It just makes your queries much too complex. Listen to this old advice about programming:
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
— Brian Kernighan
how people tackle dynamically added form fields
You could create a key/value table for the dynamic form fields:
CREATE TABLE keyvalue (
user_id INT NOT NULL,
label VARCHAR(64) NOT NULL,
value VARCHAR(255) NOT NULL,
PRIMARY KEY (user_id, label),
INDEX (label)
);
Then you can add key/value pairs for each user's dynamic form entries:
INSERT INTO keyvalue (user_id, label, value)
VALUES (123, 'Color', 'Red'),
(123, 'Age', '40');
This is still a bit inefficient in storage compared to real columns, because the label names are stored every time you enter a user's data, and you still store integers as strings. But if the users are really allowed to store any labels of their own choosing, you can't make those real columns.
With the key/value table, querying for age > 40 is simpler:
SELECT user_id FROM key_value
WHERE label = 'Age' AND value >= 40

Querying a JSON field in typeorm

I have a JSON field in Postgres Table (Column name : UserDetails)
[{"name":"UserName","status":"UserStatus","type":"UserType","number":"UserNumber"},{"name":"UserName1","status":"UserStatus1","type":"UserType1","number":"UserNumber1"}]
Basically, an array of objects.
I want to query the column 'UserDetails' to fetch all users whose name contains 'UserName' in typeorm
So far I've achieved to query the exact match against the name field of the JSON column
providedUserName='UserName1'
query.andWhere(`user.userdetails ::jsonb #> '[{"name":"${providedUserName}"}]'`)
How can I query such that I add in a LIKE constraint to the name attribute from the JSON column?
By providing 'UserName' I would like to get list of all users whose name contains 'UserName', in this case it shall return both the values
Thank you

Converting json to nested postgres composite type

I have the following nested types defined in postgres:
CREATE TYPE address AS (
name text,
street text,
zip text,
city text,
country text
);
CREATE TYPE customer AS (
customer_number text,
created timestamp WITH TIME ZONE,
default_billing_address address,
default_shipping_address address
);
And would now like to populate this types in a stored procedure, which gets json as an input parameter. This works for fields on the top-level, the output shows me the internal format of a postgres composite type:
# select json_populate_record(null::customer, '{"customer_number":"12345678"}'::json)::customer;
json_populate_record
----------------------
(12345678,,,)
(1 row)
However, postgres does not handle a nested json structure:
# select json_populate_record(null::customer, '{"customer_number":"12345678","default_shipping_address":{"name":"","street":"","zip":"12345","city":"Berlin","country":"DE"}}'::json)::customer;
ERROR: malformed record literal: "{"name":"","street":"","zip":"12345","city":"Berlin","country":"DE"}"
DETAIL: Missing left parenthesis.
What works again is, if the nested property is in postgres' internal format like here:
# select json_populate_record(null::customer, '{"customer_number":"12345678","default_shipping_address":"(\"\",\"\",12345,Berlin,DE)"}'::json)::customer;
json_populate_record
--------------------------------------------
(12345678,,,"("""","""",12345,Berlin,DE)")
(1 row)
Is there any way to get postgres to convert from a nested json structure to a corresponding composite type?
Use json_populate_record() only for nested objects:
with a_table(jdata) as (
values
('{
"customer_number":"12345678",
"default_shipping_address":{
"name":"",
"street":"",
"zip":"12345",
"city":"Berlin",
"country":"DE"
}
}'::json)
)
select (
jdata->>'customer_number',
jdata->>'created',
json_populate_record(null::address, jdata->'default_billing_address'),
json_populate_record(null::address, jdata->'default_shipping_address')
)::customer
from a_table;
row
--------------------------------------------
(12345678,,,"("""","""",12345,Berlin,DE)")
(1 row)
Nested composite types are not what Postgres (and any RDBMS) was designed for. They are too complicated and troublesome.
In the database logic nested structures should be maintained as related tables, e.g.
create table addresses (
address_id serial primary key,
name text,
street text,
zip text,
city text,
country text
);
create table customers (
customer_id serial primary key, -- not necessary `serial` may be `integer` or `bigint`
customer_number text, -- maybe redundant
created timestamp with time zone,
default_billing_address int references adresses(address_id),
default_shipping_address int references adresses(address_id)
);
Sometimes it is reasonable to have nested structure in a table but it seems more convenient and natural to use jsonb or hstore in these cases, e.g.:
create table customers (
customer_id serial primary key,
customer_number text,
created timestamp with time zone,
default_billing_address jsonb,
default_shipping_address jsonb
);
plpython to the rescue:
create function to_customer (object json)
returns customer
AS $$
import json
return json.loads(object)
$$ language plpythonu;
Example:
select to_customer('{
"customer_number":"12345678",
"default_shipping_address":
{
"name":"",
"street":"",
"zip":"12345",
"city":"Berlin",
"country":"DE"
},
"default_billing_address":null,
"created": null
}'::json);
to_customer
--------------------------------------------
(12345678,,,"("""","""",12345,Berlin,DE)")
(1 row)
Warning: postgresql when building returned object from python requires to have all null values present as None (ie. it's not allowed to skip null values as not present), thus we have to specify all null values in incoming json. For example, not allowed:
select to_customer('{
"customer_number":"12345678",
"default_shipping_address":
{
"name":"",
"street":"",
"zip":"12345",
"city":"Berlin",
"country":"DE"
}
}'::json);
ERROR: key "created" not found in mapping
HINT: To return null in a column, add the value None to the mapping with the key named after the column.
CONTEXT: while creating return value
PL/Python function "to_customer"
This seems to be solved in Postgres 10. Searching the release notes for json_populate_record shows the following change:
Make json_populate_record() and related functions process JSON arrays and objects recursively (Nikita Glukhov)
With this change, array-type fields in the destination SQL type are properly converted from JSON arrays, and composite-type fields are properly converted from JSON objects. Previously, such cases would fail because the text representation of the JSON value would be fed to array_in() or record_in(), and its syntax would not match what those input functions expect.

Extract XML data from Hive Table and Parse the data

I want to extract particular column values from a hive table. That column has XML data. How to parse through XML data and extract name and values from that particular XML column. Also I want to insert the extracted data into another Hive table.
Option 1 : LanguageManual XPathUDF
Example :
select xpath ('<a><b id="1"><c/></b><b id="2"><c/></b></a>','/descendant::c/ancestor::b/#id') from t1 limit 1 ;
[1","2]
Option2 : Another way of achieving this is Hive-XML-SerDe
In both option you need to have Xpath expression knowledge.
If you want to insert extracted data in to another table then use create table as select xxx from xxxxx (Create Table As Select (CTAS))

unique index on embedded json object

I am testing Postgresql 9.4 beta2 right now. I am wondering if it is possible to create a unique index on embedded json object?
I create a table name products:
CREATE TABLE products (oid serial primary key, data jsonb)
Now, I try to insert json object into data column.
{
"id": "12345",
"bags": [
{
"sku": "abc123",
"price": 0,
},
{
"sku": "abc123",
"price": 0,
}
]
}
However, I want sku of bags to be unique. It means the json can't be inserted into products tables because sku is not unique in this case.
I tried to create a unique index like below, but it failed.
CREATE UNIQUE INDEX product_sku_index ON products( (data->'bags'->'sku') )
Any suggestions?
Your attempt to create a UNIQUE INDEX on the expression was bound to fail for multiple reasons.
CREATE UNIQUE INDEX product_sku_index ON products( (data->'bags'->'sku') )
The first and most trivial being that ...
data->'bags'->'sku'
does not reference anything. You could reference the first element of the array with
data->'bags'->0->>'sku'
or shorter:
data#>>'{bags,0,sku}'
But that expression only returns the first value of the array.
Your definition: "I want sku of bags to be unique" .. is unclear. Do you want the value of sku to be unique? Within one JSON object or among all json objects in the column data? Or do you want to restrict the array to a single element with an sku?
Either way, neither of these goals can be implemented with a simple UNIQUE index.
Possible solution
If you want sku values to be unique across all json arrays in data->'bags', there is a way. Unnest the array and write all individual sku values to separate rows in a simple auxiliary table with a unique (or PK) constraint:
CREATE TABLE prod_sku(sku text PRIMARY KEY); -- PK enforces uniqueness
This table may be useful for additional purposes.
Here is a complete code example for a very similar problem with plain Postgres arrays:
Can PostgreSQL have a uniqueness constraint on array elements?
Only adapt the unnesting technique. Instead of:
DELETE FROM hostname h
USING unnest(OLD.hostnames) d(x)
WHERE h.hostname = d.x;
...
INSERT INTO hostname(hostname)
SELECT h
FROM unnest(NEW.hostnames) h;
Use:
DELETE FROM prod_sku p
USING jsonb_array_elements(NEW.data->'bags') d(x)
WHERE p.sku = d.x->>'sku';
...
INSERT INTO prod_sku(sku)
SELECT b->>'sku'
FROM jsonb_array_elements(NEW.data->'bags') b
Details for that:
PostgreSQL joining using JSONB