array_agg function on Apache Drill - apache-drill

I need to use array_agg function like below.(It`s postgresql)
test=# select t2.c_no, array_agg(t4.contents) from table2 as t2 inner join table4 as t4 on t2.c_no = t4.c_no group by t2.c_no;
c_no | array_agg
------+--------------------------
2 | {kkac,aa,akkk,kkacd,kka}
12 | {abc}
21 | {kk,kkacaaad}
(3 rows)
But, I can`t find array_agg function in Apache Drill documents.
Do I need to make custom function of array_agg? Does the Apache Drill have a plan for array_agg function?

Related

Union as sub query using MySQL 8

I'm wanting to optimize a query using a union as a sub query.
Im not really sure how to construct the query though.
I'm using MYSQL 8.0.12
Here is the original query:
---------------
| c1 | c2 |
---------------
| 18182 | 0 |
| 18015 | 0 |
---------------
2 rows in set (0.35 sec)
I'm sorry but the question doesn't stored if I paste the sql query as text and format using ctrl+k
Output expected
---------------
| c1 | c2 |
---------------
| 18182 | 167 |
| 18015 | 0 |
---------------
As a output I would like to have the difference of rows between the two tables in UNION ALL.
I processed this question using the wizard https://stackoverflow.com/questions/ask
Since a parenthesized SELECT can be used almost anywhere a expression can go:
SELECT
ABS( (SELECT COUNT(*) FROM tbl_aaa) -
(SELECT COUNT(*) FROM tbl_bbb) ) AS diff;
Also, MySQL is happy to allow a SELECT without a FROM.
There are several ways to go for this, including UNION, but I wouldn't recommend it, as it is IMO a bit 'hacky'. Instead, I suggest you use subqueries or use CTEs.
With subqueries
SELECT
ABS(c_tbl_aaa.size - c_tbl_bbb.size) as diff
FROM (
SELECT
COUNT(*) as size
FROM tbl_aaa
) c_tbl_aaa
CROSS JOIN (
SELECT
COUNT(*) as size
FROM tbl_bbb
) c_tbl_bbb
With CTEs, also known as WITHs
WITH c_tbl_aaa AS (
SELECT
COUNT(*) as size
FROM tbl_aaa
), c_tbl_bbb AS (
SELECT
COUNT(*) as size
FROM tbl_bbb
)
SELECT
ABS(c_tbl_aaa.size - c_tbl_bbb.size) as diff
FROM c_tbl_aaa
CROSS JOIN c_tbl_bbb
In a practical sense, they are the same. Depending on the needs, you might want to define and join the results though, and in said cases, you could use a single number as a "pseudo id" in the select statement.
Since you only want to know the differences, I used the ABS function, which returns the absolute value of a number.
Let me know if you want a solution with UNIONs anyway.
Edit: As #Rick James pointed out, COUNT(*) should be used in the subqueries to count the number of rows, as COUNT(id_***) will only count the rows with non-null values in that field.

Querying tables in BigQuery

Background
I have a table with 1 column 'data' which contains 'JSON' in BigQuery shown below.
data
{"name":"x","mobile":999,"location":"abc"}
{"name":"x1","mobile":9991,"location":"abc1"}
Now, I want to use groupby functions:
SELECT
data
FROM
table
GROUP BY
json_extract(data,'$.location')
This query throws an error
expression JSON_EXTRACT([data], '$.location') in GROUP BY is invalid
So, I modify query to
SELECT
data, json_extract(data,'$.location') as l
FROM
table
GROUP BY
l
This query throws error
Expression 'data' is not present in the GROUP BY list
Query
How can we use JSON fields in group by clause?
And what are the limitations (in context of querying),in having columns populated with JSON.
You are grouping something by location, but you are not using an aggregate function for data field, hence the compiler doesn't know which to pick or what you aggregate on the source.
Just to illustrate the example I compiled this test query which works using group_concat:
select group_concat(data),location from
(
select * from
(SELECT '{"name":"x","mobile":999,"location":"abc"}' as data,json_extract('{"name":"x","mobile":999,"location":"abc"}','$.location') as location),
(SELECT '{"name":"x","mobile":111,"location":"abc"}' as data,json_extract('{"name":"x","mobile":111,"location":"abc"}','$.location') as location),
(SELECT '{"name":"x1","mobile":9991,"location":"abc1"}' as data,json_extract('{"name":"x1","mobile":9991,"location":"abc1"}','$.location') as location)
) d
group by location
and returns:
+-----+---------------------------------------------------------------------------------------------------+----------+--+
| Row | f0_ | location | |
+-----+---------------------------------------------------------------------------------------------------+----------+--+
| 1 | {"name":"x","mobile":999,"location":"abc"},"{""name"":""x"",""mobile"":111,""location"":""abc""}" | abc | |
+-----+---------------------------------------------------------------------------------------------------+----------+--+
| 2 | {"name":"x1","mobile":9991,"location":"abc1"} | abc1 | |
+-----+---------------------------------------------------------------------------------------------------+----------+--+
BigQuery's Aggregate Functions documented here
Try below
SELECT location,
GROUP_CONCAT_UNQUOTED(REPLACE(data, ',"location":"' + location + '"', '')) AS data
FROM (
SELECT data,
JSON_EXTRACT_SCALAR(data,'$.location') AS location,
FROM YourTable
)
GROUP BY location

MySQL - displaying AVG on more than just one row

If I run a query that includes an Aggregation function (AVG), is there any way I can get that to display on multiple rows? The query I need would be something like:
SELECT field1, field2, AVG(field2) FROM tMyTable;
The output I need would be something like:
field 1 | field 2 | AVG(field2)
record1 | 1.17 | 1.19
record2 | 1.21 | 1.19
record3 | 1.18 | 1.19
As you can see, I need the average output to be displayed on each and every line. I appreciate that this may be/is an unorthodox approach, however that output format is needed for a charting app that I use.
If there are any methods available then I'd be grateful for your suggestions. Perhaps nesting a second lookup?
SELECT field1,field2,(SELECT AVG(field2) FROM Table) AS AvgFieldTwo
FROM Table
You can also use a CROSS JOIN:
SELECT field1, field2, src.AvgField2
FROM MyTable
CROSS JOIN
(
SELECT avg(field2) AvgField2
FROM MyTable
) src

Mysql and Perl using INTERVAL to calculate a total

Thank you for an example. I'm a trouble to figure out a statement with interval...
I need calculate users in certain interval of dates, but interval must be entered by a user. Example: from '2010/05/05' to '2010/30/07' if interval is 1m(month), then total of this users each 1m interval, like this: 2010/05/05 to 2010/06/05 is total users.
So far I got:
SELECT col1, client, COUNT(client) FROM table1, table2 WHERE col1 IN (condition) AND date BETWEEN '2010/05/01' AND '2010/07/30' AND DATE_ADD(CURDATE(),INTERVAL + 1 month) GROUP BY client;
Of course it calculates all total, but not dates with interval.
Also I tried to use Perl
my #data; #data from dbase.
%date_hash = ($data[1] =>$total); #$data[1] is beg and end dates user entered
foreach $dates (values %date_hash) {
$date_hash{$dates}=$total;
print "Print hash: $dates $date_hash{$dates} \n"
Thank you in advance, :)
One possible SQL only solution (I will offer an algorithm, not exact SQL code)
Create a temp table INTERVALS, populate it (probably easiest to do from Perl in a loop, though MYSQL loop would be sufficient) with data as follows (from 5/5/2010 to 12/10/2010, 1 month intervals):
| start_period | end_period | interval_number |
===============================================
| 2010/05/05 | 2010/06/04 | 1 |
| 2010/06/05 | 2010/07/04 | 2 |
| ...
| 2010/12/05 | 2010/12/10 | 8 |
Then, run the query joining your table to INTERVALS temp table via
SELECT client, COUNT(client)
, i.interval_number, i.start_period, i.end_period
FROM table1
WHERE col1 IN (condition)
AND table1.date >= inetrvals.start_period
AND table1.date <= inetrvals.end_period
GROUP BY client, i.interval_number, i.start_period, i.end_period
Please note that you can select other columns from table1 but only if you group on them as well.

Fetching linked list in MySQL database

I have a MySQL database table with this structure:
table
id INT NOT NULL PRIMARY KEY
data ..
next_id INT NULL
I need to fetch the data in order of the linked list. For example, given this data:
id | next_id
----+---------
1 | 2
2 | 4
3 | 9
4 | 3
9 | NULL
I need to fetch the rows for id=1, 2, 4, 3, 9, in that order. How can I do this with a database query? (I can do it on the client end. I am curious if this can be done on the database side. Thus, saying it's impossible is okay (given enough proof)).
It would be nice to have a termination point as well (e.g. stop after 10 fetches, or when some condition on the row turns true) but this is not a requirement (can be done on client side). I (hope I) do not need to check for circular references.
Some brands of database (e.g. Oracle, Microsoft SQL Server) support extra SQL syntax to run "recursive queries" but MySQL does not support any such solution.
The problem you are describing is the same as representing a tree structure in a SQL database. You just have a long, skinny tree.
There are several solutions for storing and fetching this kind of data structure from an RDBMS. See some of the following questions:
"What is the most efficient/elegant way to parse a flat table into a tree?"
"Is it possible to make a recursive SQL query ?"
Since you mention that you'd like to limit the "depth" returned by the query, you can achieve this while querying the list this way:
SELECT * FROM mytable t1
LEFT JOIN mytable t2 ON (t1.next_id = t2.id)
LEFT JOIN mytable t3 ON (t2.next_id = t3.id)
LEFT JOIN mytable t4 ON (t3.next_id = t4.id)
LEFT JOIN mytable t5 ON (t4.next_id = t5.id)
LEFT JOIN mytable t6 ON (t5.next_id = t6.id)
LEFT JOIN mytable t7 ON (t6.next_id = t7.id)
LEFT JOIN mytable t8 ON (t7.next_id = t8.id)
LEFT JOIN mytable t9 ON (t8.next_id = t9.id)
LEFT JOIN mytable t10 ON (t9.next_id = t10.id);
It'll perform like molasses, and the result will come back all on one row (per linked list), but you'll get the result.
If what you are trying to avoid is having several queries (one for each node) and you are able to add columns, then you could have a new column that links to the root node. That way you can pull in all the data at once by the root id, but you will still have to sort the list (or tree) on the client side.
So in this is example you would have:
id | next_id | root_id
----+---------+---------
1 | 2 | 1
2 | 4 | 1
3 | 9 | 1
4 | 3 | 1
9 | NULL | 1
Of course the disadvantage of this as opposed to traditional linked lists or trees is that the root cannot change without writing on an order of magnitude of O(n) where n is the number of nodes. This is because you would have to update the root id for each node. Fortunately though you should always be able to do this in a single update query unless you are dividing a list/tree in the middle.
This is less a solution and more of a workaround but, for a linear list (rather than the tree Bill Karwin mentioned), it might be more efficient to use a sort column on your list. For example:
TABLE `schema`.`my_table` (
`id` INT NOT NULL PRIMARY KEY,
`order` INT,
data ..,
INDEX `ix_order` (`sort_order` ASC)
);
Then:
SELECT * FROM `schema`.`my_table` ORDER BY `order`;
This has the disadvantage of slower inserts (you have to reposition all sorted elements past the insertion point) but should be fast for retrieval because the order column is indexed.