I have a field in a table called duration that contains a JSON string like this:
{
"videos": {
"en":"00:03:11",
"es":"00:03:11"
},
"audios": {
"en":"00:00:03",
"es":"00:00:03"
}
}
Is it possible to execute a query to sum all the values of both videos and audio keys using the possible langauges? Meaning that given this JSON structure, I'd like a query to return:
00:06:28
EDIT:
Don't mind the format of the values for the moment, there are ways to sum datetime values in SQL. What I'm struggling with now is to traverse the values in the JSON to actually sum them.
Related
I have a bunch of json files which have an array with column names and a separate array for the rows.
I want a dynamic way of retrieving column names and merge them with the rows for each json file.
Been playing around with derived columns and column patterns, but struggling to get it working.
I want the column names from [data.column.shortText] and values for each corresponding [data.rows.value] according to the order.
Example format
{
"messages":{
},
"data":{
"columns":[
{
"columnName":"SelectionCriteria1",
"shortText":"Case no."
},
{
"columnName":"SelectionCriteria2",
"shortText":"Period for periodical values",
},
{
"columnName":"SelectionCriteria3",
"shortText":"Location"
},
{
"columnName":"SelectionCriteriaAggregate",
"shortText":"Value"
}
],
"rows":[
[
{
"value":"23523"
},
{
"value":12342349
},
{
"value":"234234",
"code":3342
},
{
"value":234234234
}
]
]
}
}
First, you need to fix your Json data, i can see you have an extra comma in columns second Json and in rows you have value as int and as string so when i tried to parse it in ADF i got an error.
i don't quite understand why you're trying to do merge by position because normally we get rows more than columns, and if you'll get 5 rows and 3 columns you will get an error.
Here is my approach to your problem:
the main idea is that i added index column to both arrays and joined the jsons by Inner Join.
created a Source Data (its 2 but you can make it one to simplify your data flow)
added Select activity to select relevant arrays from the data.
flattened the array(in order to add index column)
added index by using rank activity (please read more about rank and dense rank and what is the difference between the two)
added a Join activity , inner join by index column.
Select activity to remove index column from the result.
saved output to sink.
Json Data that i worked with:
Data Flow:
SelectRows Activity:
Flatten Activity:
Rank actitity:
Join activity:
please check these links:
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-expressions-usage#mapAssociation
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-map-functions
I have a table mapping departments and teams in MySQL.
I want to retrieve array of teams for each department.
For example, if I have two departments depA (teams teamAA, teamAB), depB (teams teamBA, teamBB, teamBC), I want to obtain following JSON
[
{
code: "depA",
teams: ["teamAA", "teamAB"]
},
{
code: "depB",
teams: ["teamBA", "teamBB", "teamBC"]
}
]
I am able to use GROUP_CONCAT to obtain a concatenated string but, I want a JSON array so that the sequelize.js library I am using can implicitly parse the entire data structure to js object.
It can be done like so,
SELECT `departmentCode` AS `code`, JSON_ARRAY(GROUP_CONCAT(DISTINCT `teamCode`)) AS `teams`
I need to compare duplicates ip of a json by date field and remove the older date
Ex:
[
{
"IP": "10.0.0.20",
"Date": "2019-09-14T20:00:11.543-03:00"
},
{
"IP": "10.0.0.10",
"Date": "2019-09-17T15:45:16.943-03:00"
},
{
"IP": "10.0.0.10",
"Date": "2019-09-18T15:45:16.943-03:00"
}
]
The output of operation need to be like this:
[
{
"IP": "10.0.0.20",
"Date": "2019-09-14T20:00:11.543-03:00"
},
{
"IP": "10.0.0.10",
"Date": "2019-09-18T15:45:16.943-03:00"
}
]
For simplicity's sake, I'll assume the order of the data doesn't matter.
First, if your data isn't already in Python, you can use json.load or json.loads to convert it into a Python object, following the straightforward type mappings.
Then you problem has three parts: comparing date strings as dates, finding the maximum element of a list by that date, and performing this process for each distinct IP address. For these purposes, you can use two of Pyhton's built-in methods and two from the standard library.
Python's built-in max and sorted functions (as well as list.sort) support a (keyword-only) key argument, which uses a function to determine the value to compare by. For example, max(d1, d2, key=lambda x: x[0]) compares the data by the first index of the each (like d1[0] < d2[0]), and returns whichever of d1 and d2 produced the larger key.
To allow that type of comparison between dates, you can use the datetime.datetime class. If your dates are all in the format specified by datetime.datetime.fromisoformat, you can use that function to turn your date strings into datetimes, which can then be compared to each other. Using that in a function that extracts the dates from the dictionaries gives you the key function you need.
def extract_date(item):
return datetime.datetime.fromisoformat(item['Date'])
Those functions allow you to choose the object from the list with the largest date, but not to keep separate values for different IP addresses. To do that, you can use itertools.groupby, which takes a key function and puts the elements of the input into separate outputs based on that key. However, there are two things you might need to watch out for with groupby:
It only groups elements that are next to each other. For example, if you give it [3, 3, 2, 2, 3], it will group two 3s, then two 2s, then one 3 rather than grouping all three 3 together.
It returns an iterator of key, iterator pairs, so you have to collect the results yourself. The best way to do that may depend on your application, but a basic approach is nested iterations:
for key, values in groupby(data, key_function):
for value in values:
print(key, value)
With the functions I've mentioned above, it should be relatively straightforward to assemble an answer to your problem.
Is there a way to get the output of a MySQL query to list rows in the following structure
{
1:{voo:bar,doo:dar},
2:{voo:mar,doo:har}
}
as opposed to
[
{id:1,voo:bar,doo:dar},
{id:2,voo:mar,doo:har}
]
which I then have to loop through to create the desired object?
I should add that within each row I am also concatenating results to form an object, and from what I've experimented with you can't group_concatenate inside a group_concatenation. As follows:
knex('table').select(
'table.id',
'table.name',
knex.raw(
`CONCAT("{", GROUP_CONCAT(DISTINCT
'"',table.voo,'"',':','"',table.doo,'"'),
"}") AS object`
)
.groupBy('table.id')
Could GROUP BY be leveraged in any way to achieve this? Generally I'm inexperienced at SQL and don't know what's possible and what's not.
I have an SQL Table which one of the columns contain a JSON array in the following format:
[
{
"id":"1",
"translation":"something here",
"value":"value of something here"
},
{
"id":"2",
"translation":"something else here",
"value":"value of something else here"
},
..
..
..
]
Is there any way to use an SQL Query and retrieve columns with the ID as header and the "value" as the value of the column? Instead of return only one column with the JSON array.
For example, if I run:
SELECT column_with_json FROM myTable
It will return the above array. Where I want to return
1,2
value of something here, value of something else here
You can't use SQL to retrieve columns from the JSON stored inside the table: to the database engine the JSON is just unstructured text saved in a text field.
Some relational databases, like PostgreSQL, have a JSON type and functions to support JSON query. If this is your case, you should be able to perform the query you want.
Check this for an example on how it work with PostgreSQL:
http://clarkdave.net/2013/06/what-can-you-do-with-postgresql-and-json/