Retrieving nested values from PostgreSQL JSON columns - json

Our database contains a table "log" like this:
id | purchases (json)
1 {"apple":{"price":5,"seller":"frank"}, "bred":{"price":3,"seller":"kathy"}}
2 {"milk":{"price":3,"seller":"anne"}, "banana":{"price":2,"seller":"frank"}}
3 {"bred":{"price":4,"seller":"kathy"}}
We would like to retrieve all records containing "seller":"kathy". We tried simple queries like this:
SELECT id FROM log WHERE purchases ->> 'seller' LIKE 'kathy'
SELECT id FROM log WHERE purchases = '{"seller":"kathy"}'
We researched here and elsewhere for some hours ... it seems a bit more complex because the values are nested? We found e.g. some java or pgplsql implementations, but we are still hoping there is a "pure SQL" way. What would be a proper solution? Or should we re-organize our content like this:
id | purchases (json)
1 [{"product":"apple","price":5,"seller":"frank"},{"product":"bred","price":3,"seller":"kathy"}]
2 [{"product":"milk","price":3,"seller":"anne"},{"product":"banana","price":2,"seller":"frank"}]
3 [{"product":"bred","price":4,"seller":"kathy"}]
But what we found, this would be even more complex, because we have to explode the arrays within the query. Any short hint? Thanks!

Check json_each() and #>> Postgres JSON functions:
WITH log(id,purchases) AS ( VALUES
(1,'{"apple":{"price":5,"seller":"frank"}, "bred":{"price":3,"seller":"kathy"}}'::JSON),
(2,'{"milk":{"price":3,"seller":"anne"}, "banana":{"price":2,"seller":"frank"}}'::JSON),
(3,'{"bred":{"price":4,"seller":"kathy"}}'::JSON)
)
SELECT log.* FROM log,
json_each(log.purchases) as purchase
WHERE
purchase.value#>>'{seller}' = 'kathy';
Result:
id | purchases
----+-----------------------------------------------------------------------------
1 | {"apple":{"price":5,"seller":"frank"}, "bred":{"price":3,"seller":"kathy"}}
3 | {"bred":{"price":4,"seller":"kathy"}}
(2 rows)

Related

How to get from nested JSON by int rather then by name in MySQL 8

So I'm currently using MySQL's JSON field to store some data.
So the 'reports' table looks like this:
id | stock_id | type | doc |
1 | 5 | Income_Statement | https://pastebin.com/bj1hdK0S|
The pastebin is the content of the json field
What I want to do is get a number (ebit) from the first object under yearly (2018-12-31) in the JSON and then use that to do a WHERE query on so that it only returns where ebit > 50000000 for example. The issue is that the dates under yearly are not standard (i.e. one might be 2018-12-31, the other might by 2018-12-15). So essentially I want a way to get the data using integer indexes rather than the actual names of the objects, so something like yearly.[0].ebit.
How would I do this in MySQL? Alternatively if it's not possible in MySQL, would it be possible in either PostgeSQL or Mongo? If so, could you give me an example? Most of the data fits well into MySQL only this table has a JSON column which is why I started with MySQL.
so StackOverflow isn't letting my link to pastebin without some code so here's some random code:
if(dog == "poodle") {
print "test"
}
I don't know for MySQL nor MongoDB, but here's a simple version for PostgreSQL JSONB type:
SELECT (doc->'yearly'-> max(years) -> 'ebit')::numeric AS ebit
FROM reports, jsonb_object_keys(doc->'yearly') AS years
GROUP BY reports.doc;
...with simplistic test data:
WITH reports(doc) AS (
SELECT '{"yearly":{"2018-12-31":{"ebit":123},"2017-12-31":{"ebit":1.23}}}'::jsonb
)
SELECT (doc->'yearly'-> max(years) -> 'ebit')::numeric AS ebit
FROM reports, jsonb_object_keys(doc->'yearly') AS years
GROUP BY reports.doc;
...gives:
ebit
------
123
(1 row)
So I've basically selected the latest entry under "yearly" without knowing actual values but assuming that the key date formatting will allow a sort order (in this case it seems to comply with ISO-8601).
Using data type JSON instead of JSONB would preserve object key order but is not as efficient in PostgreSQL further down the road and wouldn't help here either.
IF you want to then select only those reports entries having their latest ebit greater than a certain value, just pack it into a sub-select or a CTE. I usualy prefer CTE's because they are better to read, so here we go:
WITH
reports (id, doc) AS (
VALUES
(1, '{"yearly":{"2018-12-31":{"ebit":123},"2017-12-31":{"ebit":1.23}}}'::jsonb),
(2, '{"yearly":{"2018-12-23":{"ebit":50},"2017-12-22":{"ebit":"1200.00"}}}'::jsonb)
),
r_ebit (id, ebit) AS (
SELECT reports.id, (reports.doc->'yearly'-> max(years) -> 'ebit')::numeric AS ebit
FROM reports, jsonb_object_keys(doc->'yearly') AS years
GROUP BY reports.id, reports.doc
)
SELECT id, ebit
FROM r_ebit
WHERE ebit > 100;
However, as you already see, it is not possible to filter the original rows using this strategy. A pre-processing step would make sense here so that the JSON format actually is filter-friendly.
ADDENDUM
To add the possibility of selecting the values for the n-th completed fiscal year, we need resort to window functions and we also need to reduce the resulting set to only return a single row per actual group (in the demonstration case: reports.id):
WITH reports(id, doc) AS (VALUES
(1, '{"yearly":{"2018-12-31":{"ebit":123},"2017-12-31":{"ebit":1.23},"2016-12-31":{"ebit":"23.42"}}}'::jsonb),
(2, '{"yearly":{"2018-12-23":{"ebit":50},"2017-12-22":{"ebit":"1200.00"}}}'::jsonb)
)
SELECT DISTINCT ON (1) reports.id, (reports.doc->'yearly'-> (lead(years, 0) over (partition by reports.doc order by years desc nulls last)) ->>'ebit')::numeric AS ebit
FROM reports, jsonb_object_keys(doc->'yearly') AS years
GROUP BY 1, reports.doc, years.years ORDER BY 1;
...will behave exactly as using the max aggregate function previously. Increasing the offset parameter within the lead(years, <offset>) function all will select the n-th year backwards (because of descending order of the window partition).
The DISTINCT ON (1) clause is the magic that reduces the result to a single row per distinct column value (first column = reports.id). This is why the NULLS LAST is very important inside the window OVER clause.
Here are results for different offsets (I've added a third historic entry for the first id but not for the second to also show how it deals with absent entries):
N = 0:
id | ebit
----+------
1 | 123
2 | 50
N = 1
id | ebit
----+---------
1 | 1.23
2 | 1200.00
N = 2
id | ebit
----+-------
1 | 23.42
2 |
...which means absent entries will just result in a NULL value.

MySQL Need Help reformatting vertical data to horizontal

Currently the database looks like this: (product IDs with name value pairs)
id, attribute_name, attribute_value
1, Clockspeed, 1.6Ghz
1, Screen, 13.3"
2, Clockspeed, 1.8Ghz
2, Screen, 15.1"
I would like to convert the above data to the following format (separated by product ID, with only one line per id) for migrating to a new platform.
id, Clockspeed, Screen
1, 1.6Ghz, 13.3"
2, 1.8Ghz, 15.1"
What is the easiest way to achieve this result? My gut tells me this is going to be done with the concat or group_concat function but I need a point in the right direction, going bald from pulling my hair out.
This points out one of the problems with the entity-attribute-value database design.
There are two methods to use SQL to pivot the attributes into columns, as though you had stored the data in a conventional table:
SELECT id, MAX(CASE attribute_name WHEN 'Clockspeed' THEN attribute_value END) AS Clockspeed,
MAX(CASE attribute_name WHEN 'Screen' THEN attribute_value END) AS Screen
FROM eav_table
GROUP BY id;
SELECT id, c.attribute_value AS Clockspeed, s.attribute_value AS Screen
FROM eav_table AS c
JOIN eav_table AS s USING(id)
WHERE c.attribute_name = 'Clockspeed' AND s.attribute_name = 'Screen'
Output of both queries after testing on MySQL 5.6:
+------+------------+--------+
| id | Clockspeed | Screen |
+------+------------+--------+
| 1 | 1.6GHz | 13.3" |
| 2 | 1.8GHz | 15.1" |
+------+------------+--------+
The latter solution requires N-1 joins to output N attributes. It doesn't scale well.
Both of the above solutions require that you write quite a bit of application code to format the SQL query, according to the number of attributes you want to fetch. And that means if the number of attributes varies (which is likely because that's one of the primary advantages of using EAV), then it's possible to fetch too many attributes for the query to have good performance.
Another solution is to forget about pivoting the data using only SQL. Instead, fetch the rows of data base to your application one attribute per row, as they are stored in the database. Then write application code to post-process the results into one object.

Storing IDs as comma separated values

How can I select from a database all of the rows with an ID stored in a varchar comma separated. for example, I have a table with this:
, 7, 9, 11
How can I SELECT the rows with those IDs?
Normalize your database. You should be using a lookup table most likely.
You have 2 options:
Use a function to split the string into a temp table and then join the table your selecting from to that temp table.
Use dynamic SQL to query the table where id in (#variable) --- bad choice if you choose this way.
select * from table_name where id in (7, 9, 11)
If you do typically have that comma at the start, you will need to remove it first.
Use match(column) against('7,9,11')
this willl show all varchar column of your id's where 7,9,11 is there.
But you have to be shure that ur column have fulltext index.
Just yesterday I was fixing a bug in an old application here and saw where they handled it like this:
AND (T.ServiceIDs = '#SegmentID#' OR T.ServiceIDs LIKE '#SegmentID#,%'
OR T.ServiceIDs LIKE '%,#SegmentID#,%' OR T.ServiceIDs LIKE '%,#SegmentID#')
I am assuming you are saying something like the value of ServiceIDs from the database might contain 7,9,11 and that the variable SegmentID is one or more values. It was inside a CFIF statement checking to see that SegmentID in fact had a value(which was always the case due to prior logic that would default it.
I personally though would do as others have suggested and I'd create what I always refer to as a bridging table that allows you to have 0 to many PKs from one table related to the PK of another.
I had to tackle this problem years ago where I could not change the table structure and I created a custom table type and a set of functions so I could treat the values via SQL as if they were coming from a table. That custom table type solution though was specific to Oracle and I'd not know how to do that in MySQL without some research on my part.
There is a reason querying lists is so difficult: databases are not designed to work with delimited lists. They are optimized to work best with rows (or sets) of data. Creating the proper table structure will result in much better query performance and simpler sql queries. (So while it is technically possible, you should seriously consider normalizing your database as Todd and others suggested.)
Many-to-many relationships are best represented by three (3) tables. Say you are selling "widgets" in a variety of "sizes". Create two tables representing the main entities:
Widget (unique widgets)
WidgetID | WidgetTitle
1 | Widget 1
2 | Widget 2
....
Size (unique sizes)
SizeID | SizeTitle
7 | X-Small
8 | Small
9 | Medium
10 | Large
11 | X-Large
Then create a junction table, to store the relationships between those two entities, ie Which widgets are available in which sizes
WidgetSize (available sizes for each widget)
WidgetID | SizeID
1 | 7 <== Widget 1 "X-Small"
1 | 8 <== Widget 1 + "Small"
2 | 7 <== Widget 2 + "X-Small"
2 | 9 ....
2 | 10
2 | 11
....
With that structure, you can easily return all widgets having any (or all) of a list of sizes. Not tested, but something similar to the sql below should work.
Find widgets available in any of the sizes: <cfset listOfSizes = "7,9,11">
SELECT w.WidgetID, w.WidgetTitle
FROM Widget w
WHERE EXISTS
( SELECT 1
FROM WidgetSize ws
WHERE ws.WidgetID = w.WidgetID
AND ws.SizeID IN (
<cfqueryparam value="#listOfSizeIds#"
cfsqltype="cf_sql_integer" list="true" >
)
)
Find widgets available in all three sizes: <cfset listOfSizes = "7,9,11">
SELECT w.WidgetID, w.WidgetTitle, COUNT(*) AS MatchCount
FROM Widget w INNER JOIN WidgetSize ws ON ws.WidgetID = w.WidgetID
WHERE ws.SizeID IN (
<cfqueryparam value="#listOfSizeIds#"
cfsqltype="cf_sql_integer" list="true" >
)
GROUP BY w.WidgetID, w.WidgetTitle
HAVING MatchCount = 3

MySQL Query Search using Multiple Rows

Firstly I'd like to start by apologizing for the potentially miss-leading title... I am finding it difficult to describe what I am trying to do here.
With the current project I'm working on, we have setup a 'dynamic' database structure with MySQL that looks something like this.
item_details ( Describes the item_data )
fieldID | fieldValue | fieldCaption
1 | addr1 | Address Line 1
2 | country | Country
item_data
itemID | fieldID | fieldValue
12345 | 1 | Some Random Address
12345 | 2 | United Kingdom
So as you can see, if for example I wanted to lookup the address for the item 12345 I would simply do the statement.
SELECT fieldValue FROM item_data WHERE fieldID=1 and itemID=12345;
But here is where I am stuck... the database is relatively large with around ~80k rows and I am trying to create a set of search functions within PHP.
I would like to be able to perform a query on the result set of a query as quickly as possible...
For example, Search an address name within a certain country... ie: Search for the fieldValue of the results with the same itemID's as the results from the query:
'SELECT itemID from item_data WHERE fieldID=2 and fieldValue='United Kingdom'..
Sorry If I am unclear, I have been struggling with this for the past couple of days...
Cheers
You can do this in a couple of ways. One is to use multiple joins to the item_data table with the fieldID limited to whatever it is you want to get.
SELECT *
FROM
Item i
INNER JOIN item_data country
ON i.itemID = country.itemID
and fieldid = 2
INNER JOIN item_data address
ON i.itemID = country.itemID
and fieldid = 1
WHERE
country.fieldValue= 'United Kingdom'
and address.fieldValue= 'Whatever'
As an aside this structure is often referred to as an Entry Attribute Value or EAV database
Sorry in advance if this sounds patronizing, but (as you suggested) I'm not quite clear what you are asking for.
If you are looking for one query to do the whole thing, you could simply nest them. For your example, pretend there is a table named CACHED with the results of your UK query, and write the query you want against that, but replace CACHED with your UK query.
If the idea is that you have ALREADY done this UK query and want to (re-)use its results, you could save the results to a table in the DB (which may not be practical if there are a large number of queries executed), or save the list of IDs as text and paste that into the subsequent query (...WHERE ID in (...) ... ), which might be OK if your 'cached' query gives you a manageable fraction of the original table.

How to create a mysql join query with hierarchical data

I need to create a join query for hierarchical data across 2 tables. These tables can have unlimited amounts of data and their structures are as follows:
group_id group_name group_order
1 group 1 2
2 group 2 1
field_id field_name parent_group field_order
1 field 1 1 1
2 field 2 2 2
3 field 3 2 1
I am currently able to get the correct format of data using 2 select queries with the second query inside a loop created from the results of the first query on the groups table.
The structure of the data I require from the result is as follows:
-group 2
- field 3
- field 2
- group 1
- field 1
Is it possible to get these results from one mysql query? I have read through the mysql document on hierarchical data by I am confused about how to incorporate the join.
Thanks for looking
You shouldn't need to think about it in terms of hierarchical data, you should just be able to select your fields and join on your group information. Try something like:
SELECT *
FROM Fields AS F
INNER JOIN Groups AS G
ON G.group_id = F.parent_group
ORDER BY group_order, field_order
Then you will get each fields as a row with the applicable group, also in the correct group order. Your loop should be able to handle the display you need.
one method
something that may convince you change your db schema