Querying tables in BigQuery - mysql

Background
I have a table with 1 column 'data' which contains 'JSON' in BigQuery shown below.
data
{"name":"x","mobile":999,"location":"abc"}
{"name":"x1","mobile":9991,"location":"abc1"}
Now, I want to use groupby functions:
SELECT
data
FROM
table
GROUP BY
json_extract(data,'$.location')
This query throws an error
expression JSON_EXTRACT([data], '$.location') in GROUP BY is invalid
So, I modify query to
SELECT
data, json_extract(data,'$.location') as l
FROM
table
GROUP BY
l
This query throws error
Expression 'data' is not present in the GROUP BY list
Query
How can we use JSON fields in group by clause?
And what are the limitations (in context of querying),in having columns populated with JSON.

You are grouping something by location, but you are not using an aggregate function for data field, hence the compiler doesn't know which to pick or what you aggregate on the source.
Just to illustrate the example I compiled this test query which works using group_concat:
select group_concat(data),location from
(
select * from
(SELECT '{"name":"x","mobile":999,"location":"abc"}' as data,json_extract('{"name":"x","mobile":999,"location":"abc"}','$.location') as location),
(SELECT '{"name":"x","mobile":111,"location":"abc"}' as data,json_extract('{"name":"x","mobile":111,"location":"abc"}','$.location') as location),
(SELECT '{"name":"x1","mobile":9991,"location":"abc1"}' as data,json_extract('{"name":"x1","mobile":9991,"location":"abc1"}','$.location') as location)
) d
group by location
and returns:
+-----+---------------------------------------------------------------------------------------------------+----------+--+
| Row | f0_ | location | |
+-----+---------------------------------------------------------------------------------------------------+----------+--+
| 1 | {"name":"x","mobile":999,"location":"abc"},"{""name"":""x"",""mobile"":111,""location"":""abc""}" | abc | |
+-----+---------------------------------------------------------------------------------------------------+----------+--+
| 2 | {"name":"x1","mobile":9991,"location":"abc1"} | abc1 | |
+-----+---------------------------------------------------------------------------------------------------+----------+--+
BigQuery's Aggregate Functions documented here

Try below
SELECT location,
GROUP_CONCAT_UNQUOTED(REPLACE(data, ',"location":"' + location + '"', '')) AS data
FROM (
SELECT data,
JSON_EXTRACT_SCALAR(data,'$.location') AS location,
FROM YourTable
)
GROUP BY location

Related

Mysql - How do I avoid group by but still with concat and group concat I would need to combine multiple columns and rows results

I have something like in table
mysql> select uuid , short-uuid FROM sampleUUID WHERE identifier ="test123";
+--------------------------------------+-------------+
| uuid | short-uuid |
+--------------------------------------+-------------+
| 11d52ebd-1404-115d-903e-8033863ee848 | 8033863ee848 |
| 22b6f783-aeaf-1195-97ef-a6d8c47261b1 | 8033863ee848 |
| 33c51085-ccd8-1119-ac37-332510a16e1b | 332510a16e1b |
+--------------------------------------+-------------+
I would be needing a result like (grouped all in single row, single value w.r.t uuid and short-uuid being same)
| uuidDetails
+----------------------------------------------------------------------------------------------------------------+-------------+
| 11d52ebd-1404-115d-903e-8033863ee848,22b6f783-aeaf-1195-97ef-a6d8c47261b1|8033863ee848&&33c51085-ccd8-1119-ac37-332510a16e1b| 332510a16e1b |
+----------------------------------------------------------------------------------------------------------------+-------------+
(basically grouping uuid and short uuid in a single row from multiple rows and columns)
I know this can be achieved by select GROUP_CONCAT(uuid)FROM sampleUUID WHERE identifier ="test123" group by short-uuid;
but i don't wanna use group by here because that give multiple rows, i would need all in one row .
I have tried with below stuffs but failed to get the the results in single row
select ANY_VALUE(CONCAT_WS( '||',CONCAT_WS('|',GROUP_CONCAT(uuid) SEPARATOR ','),short-uuid)) )as uuidDetails from sampleUUID
where identifier ="test123";
this resulted like below with not appending short-uuid properly (there is only 1 short uuid appended here,Actually it needs to be grouped first 2 uuids with 1 short(because same short-uuid) uuid and 3rd uuid with other short uuid)
| uuidDetails
+----------------------------------------------------------------------------------------------------------------+-------------+
| 11d52ebd-1404-115d-903e-8033863ee848,22b6f783-aeaf-1195-97ef-a6d8c47261b1,33c51085-ccd8-1119-ac37-332510a16e1b| 332510a16e1b |
+----------------------------------------------------------------------------------------------------------------+-------------+
which is not i expected
Any help here will be appreciated . Thank you
Use nested queries.
SELECT GROUP_CONCAT(result ORDER BY result SEPARATOR '&&') AS uuidDetails
FROM (
SELECT CONCAT(GROUP_CONCAT(uuid ORDER BY uuid SEPARATOR ','), '|', short_uid) AS result
FROM sampleUUID
WHERE identifier = 'test123'
GROUP BY short_uid
) AS x
NOTE: If there is no requirement for ordering of the UUID values, we can use ORDER BY inside the GROUP_CONCAT aggregates to make the result more deterministic, so the query will return just one of a number of possible results given the same data e.g. return aa,bb|1&&cc|3 rather than bb,aa|1&&cc|3 or cc|3&&aa,bb|1 or cc|3&&bb,aa|1.

postgres json_populate_recordset not working as expected

I have a table called slices with some simple json objects that looks like this:
id | payload | metric_name
---|---------------------------------------|------------
1 | {"a_percent":99.97,"c_percent":99.97} | metric_c
2 | {"a_percent":98.37,"c_percent":97.93} | metric_c
many records of this. I am trying to get this:
a_percent | c_percent
----------|----------
99.97 | 99.97
98.37 | 97.93
I am creating the type and using json_populate_recordset along with json_agg in the following fashion:
CREATE TYPE c_history AS(
"a_percent" NUMERIC(5, 2),
"c_percent" NUMERIC(5, 2)
);
SELECT * FROM
json_populate_recordset(
NULL :: c_history,
(
SELECT json_agg(payload::json) FROM slices
WHERE metric_name = 'metric_c'
)
);
The clause select json_agg(...) by itself produces a nice array of json objects, as expected:
[{"a_percent":99.97,"c_percent":99.97}, {"a_percent":98.37,"c_percent":97.93}]
But when I run it inside json_populate_recordset, I get Error : ERROR: must call json_populate_recordset on an array of objects.
What am I doing wrong?
This is a variant of #TimBiegeleisen's solution with the function json_populate_record() used in a from clause:
select id, r.*
from slices,
lateral json_populate_record(null::c_history, payload) r;
See rextester or SqlFiddle.
You don't need to use json_agg, since it appears you want to get the set of a_percent and c_percent values for each id in a separate record. Rather just call json_populate_recordset as follows:
SELECT id, (json_populate_record(null::c_history, payload)).* FROM slices

Distinct number of specific items in list

I rarely do stuff in MySQL, so for me this is rocket science ...
I want to know how many times distinct values starting with "abc-" are present in a list.
So for example how many times "abc-table" and "abc-sofa" are present.
The table:
| object
-----------
| abc-table
| def-table
| ghi-chair
| abc-sofa
| abc-table
The result should be like:
| name number
-------------------
| abc-table 2
| abc-sofa 1
(Excuse me for the badly formatted tables.)
I tried the following, but that turns out to be incorrect:
SELECT object, COUNT(DISTINCT object) WHERE object LIKE abc-% FROM table GROUP BY object
Any help is appreciated.
WHERE clause should be after FROM.
Use single quote ' for the LIKE operator.
No need of DISTINCT in your case.
Try the below query:
SELECT `object` AS `name`, COUNT(`object`) AS `number`
FROM table
WHERE `object` LIKE 'abc-%'
GROUP BY `object`
ORDER BY COUNT(`object`) DESC; -- add order by if you need to sort by count
Result:
name number
----------------
abc-table 2
abc-sofa 1
DEMO
Use count(*), groupt by , like 'abc-%' and having
SELECT object, COUNT(*)
FROM table
WHERE object LIKE 'abc-%'
group by object
having count(*) >=1

Sum and percentage on json array elements

My table is like this:
create table alphabet_soup(
id numeric,
index json bigint
);
my data looks like this:
(id, json) looks like this: (1, '{('key':1,'value':"A"),('key':2,'value':"C"),('key':3,'value':"C")...(600,"B")}')
How do I sum across the json for number of A and number of B and do % of the occurence of A or B? I have about 6 different types of values (ABCDEF), but for simplicity I am just looking for a comparison of 3 values.
I am trying to find something to help me calculate the % of occurrence of a value from a key value pair in json. I am using postgres 9.4. I am new to both json and postgres, and I am landing on the same json functions manual page of postgres over and over.
I have managed to find a sum, but how to calculate the % in a nested select and display the key and values in increasing order of occurence like follows:
value | occurence | %
====================================
A | 300 | 50
B | 198 | 33
C | 102 | 17
The script I am using for the sum is :
select id, index->'key'::key as key
sum(case when (1,index::json->'1')::text = (1,index::json->'2')::text
then 1
else 0
end)/count(id) as res
from
alphabet_soup
group by id;
limit 10;
I get an output as follows:
column "alphabet_soup.id" must appear in the group by clause or be used in an aggregate function.
Thanks for the comment Patrick. Sorry I forgot to add I am using postgres 9.4
The easiest way to do this is to expand the json document into a regular row set using the json_each_text() function. Every single json document then becomes a set of rows and you can then apply aggregate function as you would on any other row set. However, you need to use the function as a row source (section 7.2.1.4) (since it returns a set of rows) and then select the value field which has the category of interest. Note that the function uses a field of the table, through an implicit LATERAL join (section 7.2.1.5).
SELECT id, value
FROM alphabet_soup, json_each_text("index");
which yields something like:
test=# SELECT id, value FROM alphabet_soup, json_each_text("index");
id | value
----+-------
1 | A
1 | C
1 | C
1 | B
To this you can apply regular aggregate functions over the appropriate windows to get the result you are looking for:
SELECT DISTINCT id, value,
count(value) OVER (PARTITION BY id, value) AS occurrence,
count(value) OVER (PARTITION BY id, value) * 100.0 /
count(id) OVER (PARTITION BY id) AS percentage
FROM (
SELECT id, value
FROM alphabet_soup, json_each_text("index") ) sub
ORDER BY id, value;
Which gives a result like:
id | value | occurrence | percentage
----+-------+------------+---------------------
1 | A | 1 | 25.0000000000000000
1 | B | 1 | 25.0000000000000000
1 | C | 2 | 50.0000000000000000
This will work for any number of categories (ABCDEF) and any number of ids.
# Patrick, it was an accident. I am new to stackoverflow. I did not realize how ti works. I was fiddling around and I found the answer to the question I asked in addition to the first one. Sorry about that!
For fun, I added some more to the code to make the % compare of the result set:
With q1 as
(SELECT DISTINCT id, value,
count(value) OVER (PARTITION BY id, value) AS occurrence,
count(value) OVER (PARTITION BY id, value) * 100.0 / count(id) OVER(PARTITION BY id) AS percentage
FROM ( SELECT id, value FROM alphabet_soup, json_each_text("index") ) sub
ORDER BY id, value) Select distinct id, value, least(percentage) from q1
Where (least(percentage))>20 Order by id, value;
The output for this is:
id | value | least
----+-------+--------
1 | B | 33
1 | C | 50

MySQL - GROUP_CONCAT if value is not a substring

I have a column called "Permissions" in my table. The permissions are strings which can be:
"r","w","x","rw","wx","rwx","xwr"
etc. Please note the order of characters in the string is not fixed. I want to GROUP_CONCAT() on the "Permissions" column of my table. However this causes very large strings.
Example: "r","wr","wx" group concatenated is "r,wr,wx" but should be "r,w,x" or "rwx". Using distinct() clause doesn't seem to help much. I am thinking that if I could check if a permission value is a substring of the other column then I should not concatenate it, but I don't seem to find a way to accomplish that.
Any column based approach using solely string functions would also be appreicated.
EDIT:
Here is some sample data:
+---------+
| perm |
+---------+
| r,x,x,r |
| x |
| w,rw |
| rw |
| rw |
| x |
| w |
| x,x,r |
| r,x |
+---------+
The concatenated result should be:
+---------+
| perm |
+---------+
| r,w,x |
+---------+
I don't have control over the source of data and would like not to create new tables ( because of restricted privileges and memory constraints). I am looking for a post-processing step that converts each column value to the desired format.
A good idea would be to first normalize your data.
You could, for example try this way (I assume your source table is named Files):
Create simple table called PermissionCodes with only column named Code (type of string).
Put r, w, and x as values into PermissionCodes (three rows total).
In a subquery join Files to PermissionCodes on a condition that Code exists as a substring in Permissions.
Perform your GROUP_CONCAT aggregation on the result of the subquery.
If it is a case here, that for the same logical entires in Files there exists multiple permission sets that overlaps (i.e. for some file there is a row with rw and another row with w) then you would limit your subquery to distinct combinations of Files' keys and Code.
Here's a fiddle to demonstrate the idea:
http://sqlfiddle.com/#!9/6685d6/4
You can try something like:
SELECT user_id, GROUP_CONCAT(DISTINCT perm)
FROM Permissions AS p
INNER JOIN (SELECT 'r' AS perm UNION ALL
SELECT 'w' UNION ALL
SELECT 'x') AS x
ON p.permission LIKE CONCAT('%', x.perm, '%')
GROUP BY user_id
You can include any additional permission code in the UNION ALL of the derived table used to JOIN with Permissions table.
Demo here