I am using Big Query for as a cloud data warehouse and DataStudio for vizualisation.
In Big Query I have a table with a column named data written in JSON. I only want to extract what is inside the field "city".
This formula below that someone gave me worked to extract what is inside the field "title". I used it to create a field in DataStudio.
REPLACE(REGEXP_EXTRACT(data, '"title":(.+)","ur'), "\"", "")
So, I tried in multiple ways to reuse this formula for the "city" field, but it hasn't worked. I don't understand this code.
What's inside my column data:
{
"address":{
"city":"This is what i want",
"country":"blablabla",
"lineAdresse":"blablabla",
"region":"blablabla",
"zipCode":"blablabla"
},
"contract":"blablabla",
"dataType":"blablabla",
"description":"blablabla",
"endDate":{
"_seconds":1625841747,
"_nanoseconds":690000000
},
"entreprise":{
"denomination":"blabla",
"description":"1",
"logo":"blablabla",
"blabla":"blablabla",
"verified":"false"
},
"id":"16256R8TOUHJG",
"idEntreprise":"blablabla",
"jobType":"blablabla",
"listInfosRh":null,
"listeCandidats":[
],
"field":0,
"field":0,
"field":14,
"field":"1625834547690",
"field":true,
"field":"",
"field":"ref1625834547690",
"skills":[
"field",
"field",
"field"
],
"startDate":{
"_seconds":1625841747,
"_nanoseconds":690000000
},
"status":true,
"title":"this I can extract",
"urlRedirection":"blablabla",
"validated":true
}
If anyone knows the formula to put in Data Studio to extract what's inside city and can explain it to me, this would help a lot.
Here's the formula I tried but where I got "null" result:
REPLACE(REGEXP_EXTRACT(data,'"city":/{([^}]*)}/'),"\"","") >>null
I tried this one but it wouldn't stop at the city. I got the address, the region, zipcode and all the rest after:
REPLACE(REGEXP_EXTRACT(data, '"city":(.+)","ur'), "\"", "")
It is possible to parse a JSON in a text field by ignoring any hierarchy and only looking for a specific field. In your case the field names were title and city . Please be aware that this approach is not save for user entered data: By setting the value of the "city":"\" hide \"" the script cannot extract the city.
select *,
REGEXP_EXTRACT(data, r'"title":\s*"([^"]+)') as title,
REGEXP_EXTRACT(data, r'"city":\s*"([^"]+)') as city
from(
Select ' { "address":{ "city":"This is what i want", "country":"blablabla", "lineAdresse":"blablabla", "region":"blablabla", "zipCode":"blablabla" }, "contract":"blablabla", "dataType":"blablabla", "description":"blablabla", "endDate":{ "_seconds":1625841747, "_nanoseconds":690000000 }, "entreprise":{ "denomination":"blabla", "description":"1", "logo":"blablabla", "blabla":"blablabla", "verified":"false" }, "id":"16256R8TOUHJG", "idEntreprise":"blablabla", "jobType":"blablabla", "listInfosRh":null, "listeCandidats":[ ], "field":0, "field":0, "field":14, "field":"1625834547690", "field":true, "field":"", "field":"ref1625834547690", "skills":[ "field", "field", "field" ], "startDate":{ "_seconds":1625841747, "_nanoseconds":690000000 }, "status":true, "title":"this I can extract", "urlRedirection":"blablabla", "validated":true }' as data
)
Related
I have a sqlite database and in one of the fields I have stored complete json object . I have to make some json select requests . If you see my json
the ALL key has value which is an array . We need to extract some data like all comments where "pod" field is fb . How to extract properly when sqlite json has value as an array ?
select json_extract(data,'$."json"') from datatable ; gives me entire thing . Then I do
select json_extract(data,'$."json"[0]') but i dont want to do it manually . i want to iterate .
kindly suggest some source where i can study and work on it .
MY JSON
{
"ALL": [{
"comments": "your site is awesome",
"pod": "passcode",
"originalDirectory": "case1"
},
{
"comments": "your channel is good",
"data": ["youTube"],
"pod": "library"
},
{
"comments": "you like everything",
"data": ["facebook"],
"pod": "fb"
},
{
"data": ["twitter"],
"pod": "tw",
"ALL": [{
"data": [{
"codeLevel": "3"
}],
"pod": "mo",
"pod2": "p"
}]
}
]
}
create table datatable ( path string , data json1 );
insert into datatable values("1" , json('<abovejson in a single line>'));
Simple List
Where your JSON represents a "simple" list of comments, you want something like:
select key, value
from datatable, json_each( datatable.data, '$.ALL' )
where json_extract( value, '$.pod' ) = 'fb' ;
which, using your sample data, returns:
2|{"comments":"you like everything","data":["facebook"],"pod":"fb"}
The use of json_each() returns a row for every element of the input JSON (datatable.data), starting at the path $.ALL (where $ is the top-level, and ALL is the name of your array: the path can be omitted if the top-level of the JSON object is required). In your case, this returns one row for each comment entry.
The fields of this row are documented at 4.13. The json_each() and json_tree() table-valued functions in the SQLite documentation: the two we're interested in are key (very roughly, the "row number") and value (the JSON for the current element). The latter will contain elements called comment and pod, etc..
Because we are only interested in elements where pod is equal to fb, we add a where clause, using json_extract() to get at pod (where $.pod is relative to value returned by the json_each function).
Nested List
If your JSON contains nested elements (something I didn't notice at first), then you need to use the json_tree() function instead of json_each(). Whereas the latter will only iterate over the immediate children of the node specified, json_tree() will descend recursively through all children from the node specified.
To give us some data to work with, I have augmented your test data with an extra element:
create table datatable ( path string , data json1 );
insert into datatable values("1" , json('
{
"ALL": [{
"comments": "your site is awesome",
"pod": "passcode",
"originalDirectory": "case1"
},
{
"comments": "your channel is good",
"data": ["youTube"],
"pod": "library"
},
{
"comments": "you like everything",
"data": ["facebook"],
"pod": "fb"
},
{
"data": ["twitter"],
"pod": "tw",
"ALL": [{
"data": [{
"codeLevel": "3"
}],
"pod": "mo",
"pod2": "p"
},
{
"comments": "inserted by TripeHound",
"data": ["facebook"],
"pod": "fb"
}]
}
]
}
'));
If we were to simply switch to using json_each(), then we see that a simple query (with no where clause) will return all elements of the source JSON:
select key, value
from datatable, json_tree( datatable.data, '$.ALL' ) limit 10 ;
ALL|[{"comments":"your site is awesome","pod":"passcode","originalDirectory":"case1"},{"comments":"your channel is good","data":["youTube"],"pod":"library"},{"comments":"you like everything","data":["facebook"],"pod":"fb"},{"data":["twitter"],"pod":"tw","ALL":[{"data":[{"codeLevel":"3"}],"pod":"mo","pod2":"p"},{"comments":"inserted by TripeHound","data":["facebook"],"pod":"fb"}]}]
0|{"comments":"your site is awesome","pod":"passcode","originalDirectory":"case1"}
comments|your site is awesome
pod|passcode
originalDirectory|case1
1|{"comments":"your channel is good","data":["youTube"],"pod":"library"}
comments|your channel is good
data|["youTube"]
0|youTube
pod|library
Because JSON objects are mixed in with simple values, we can no longer simply add where json_extract( value, '$.pod' ) = 'fb' because this produces errors when value does not represent an object. The simplest way around this is to look at the type values returned by json_each()/json_tree(): these will be the string object if the row represents a JSON object (see above documentation for other values).
Adding this to the where clause (and relying on "short-circuit evaluation" to prevent json_extract() being called on non-object rows), we get:
select key, value
from datatable, json_tree( datatable.data, '$.ALL' )
where type = 'object'
and json_extract( value, '$.pod' ) = 'fb' ;
which returns:
2|{"comments":"you like everything","data":["facebook"],"pod":"fb"}
1|{"comments":"inserted by TripeHound","data":["facebook"],"pod":"fb"}
If desired, we could use json_extract() to break apart the returned objects:
.mode column
.headers on
.width 30 15 5
select json_extract( value, '$.comments' ) as Comments,
json_extract( value, '$.data' ) as Data,
json_extract( value, '$.pod' ) as POD
from datatable, json_tree( datatable.data, '$.ALL' )
where type = 'object'
and json_extract( value, '$.pod' ) = 'fb' ;
Comments Data POD
------------------------------ --------------- -----
you like everything ["facebook"] fb
inserted by TripeHound ["facebook"] fb
Note: If your structure contained other objects, of different formats, it may not be sufficient to simply select for type = 'object': you may have to devise a more subtle filtering process.
I have some code which reads information in from a CSV file. The information is regarding road traffic accidents.
so with the code, i can target the values using 'd['Weather Conditions']' which gives me -
which is great! But what i'm actually looking to do is to store values of particular types in variables. e.g so all values equal to 'Fine without high winds' would be stored in an array like object like 'var windy' or something similar, is there anyway i could go about this simply?
*update
some sample data
Any help would be appreciated. Thanks
Right now, in your code, you are simply printing to console each weather condition you encounter as you go over the dataset row by row.
If you want to group your data based on the weather conditions, you can do it using d3.nest().
The example below uses some simple json data to show how you can do it. Here, using d3.nest(), we are setting the key to be the weather condition and the values to be the object.
After structuring, your data will be grouped, you can check by running the script below:
All accidents with rainy will be under the key rainy and the same for all other weather conditions.
var data = [{
"Weather Condition": "rainy",
"property1": "val1",
"property2": "val2"
}, {
"Weather Condition": "sunny",
"property1": "val1",
"property2": "val2"
}, {
"Weather Condition": "sunny",
"property1": "val1",
"property2": "val2"
}];
// put the snippet below in your d3.csv function
var expensesByName = d3.nest()
.key(function(d) {
return d["Weather Condition"];
})
.entries(data);
console.log(expensesByName);
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.4.11/d3.min.js"></script>
Is there a functionality to use logical operators in JSON path file used in copy command.
For example, I have a JSON which can contain a key which can either be
Desc
Or
Description
So in the JSON it would be something like -
{
"Desc": "Hello",
"City" : "City1",
"Age": "21"
}
{
"Description" : "World",
"City" : "City2",
"Age": "25"
}
I'm using copy command to pull the data from the JSON above into my table in redshift. The table has a column named "description_data". This would store values of either "Desc" or "Description". So I want my path file to identify using an "OR" condition.
This is the path file that I'm currently using -
{
"jsonpaths": [
"$['Desc']",
"$['City']",
"$['Age']"
]
}
Which is working fine.
What I'm trying to do is the below (this is where I'm unsure if there is any syntax or functionality to achieve the objective)
{
"jsonpaths": [
"$['Desc']" or "$['Description']",
"$['City']",
"$['Age']"
]
}
No, Redshift doesn't support this.
You can issue two copy commands, one with Desc, and another with Description, to load the data into two temporary tables. After that, you can merge the two into your final table.
I am a newbie to MongoDB. I am experimenting the various ways of extracting fields from a document inside collection.
Here in the below JSON document, I am finding it difficult to get extract it according to my need
{
"_id":1,
"dependencies":{
"a":[
"hello",
"hi"
],
"b":[
"Hmmm"
],
"c":[
"Vanilla",
"Strawberry",
"Pista"
],
"d":[
"Carrot",
"Cauliflower",
"Potato",
"Cabbage"
]
},
"productid":"25",
"date":"Thu Jul 30 11:36:49 PDT 2015"
}
I need to display the following output:
c:[
"Vanilla",
"Strawberry",
"Pista"
]
Can anyone please help me in solving it?
MongoDB Aggregation comes into rescue to get the result you are looking for :
$Project--> Passes along the documents with only the specified fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.
db.collection.aggregate( [
{ $project :
{ c: "$dependencies.c", _id : 0 }
}
]).pretty();
As per the output you required, we just need to project ( display) the field "dependencies.c" , so we are creating a new field "c" and assigining the value of the "dependencies.c" into it.
Also by defalut "_id" field will be display along with the result, since you dont need it, so we are suppressing of the _id field by assigining "_id" : <0 or false>, so that it will not display the _id field in the output.
The above query will fetch you the result as below :
"c" : [
"Vanilla",
"Strawberry",
"Pista"
]
I need a little help regarding lucene index files, thought, maybe some of you guys can help me out.
I have json like this:
[
{
"Id": 4476,
"UrlName": null,
"PhoneData": [
{
"PhoneType": "O",
"PhoneNumber": "0065898",
},
{
"PhoneType": "F",
"PhoneNumber": "0065898",
}
],
"Contact": [],
"Services": [
{
"ServiceId": 10,
"ServiceGroup": 2
},
{
"ServiceId": 20,
"ServiceGroup": 1
}
],
}
]
Adding first two fields is relatively easy:
// add lucene fields mapped to db fields
doc.Add(new Field("Id", sampleData.Id.Value.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("UrlName", sampleData.UrlName.Value ?? "null" , Field.Store.YES, Field.Index.ANALYZED));
But how I can add PhoneData and Services to index so it can be connected to unique Id??
For indexing JSON objects I would go this way:
Store the whole value under a payload field, named for example $json. This field would be stored but not indexed.
For each (indexable) property (maybe nested) create an indexable field with its name as a XMLPath-like expression identifying the property, for example PhoneData.PhoneType
If is ok that all nested properties will be indexed then it's simple, just iterate over all of them generating this indexable field.
But if you don't want to index all of them (a more realistic case), how to know which property is indexable is another problem; in this case you could:
Accept from the client the path expressions of the index fields to be created when storing the document, or
Put JSON Schema into play to describe your data (assuming your JSON records have a common schema), and extend it with a custom property that would allow you to tag which properties are indexable.
I have created a library doing this (and much more) that maybe can help you.
You can check it at https://github.com/brutusin/flea-db