Bigquery Get json key name - json

I have a BigQuery table that contains a column that contains a JSON string. Within the JSON, there may be a key called "person" or "corp" or "sme". I want to run a query that will return which of the possible keys exist in the JSON and store it in a new column.
Below is the data from a column 'class', which is one long string each in BQ. The first level key name can equal ‘corp’, ’sme’, or ‘person’ (see examples below).
Example 1
{
"corp": {
"address": {
"city": "London",
"countryCode": "gb",
"streetAddress": [
"Fairlop road"
],
"zip": "e111bn"
},
"cin": 1234567420,
"title": "Demo Corp"
}
}
Example 2
{
"person": {
"address": {
"city": "Madrid",
"countryCode": "es",
"streetAddress": [
"Some street 1"
],
"zip": "z1123ab"
},
"cin": 1234567411,
"title": "Demo Person"
}
}
I've tried using the json_xxx functions, but they require specifying the json_path. I'm interested in fetching the json_path name to create a new column (cust_type)which lists corp, sme, person for each row.
example
cust_type
1
corp
2
person
This is my first question so pls bear with me! Thnx

Also you can use a function to extract first level keys whatever they are.
CREATE TEMP FUNCTION json_keys(input STRING) RETURNS ARRAY<STRING> LANGUAGE js AS """
return Object.keys(JSON.parse(input))
""";
SELECT json_keys(json_text) AS cust_type
FROM UNNEST([
'{"corp": {"address": {"city": "London","countryCode": "gb","streetAddress": ["Fairlop road"],"zip": "e111bn"},"cin": 1234567420,"title": "Demo Corp"}}',
'{"person": {"address": {"city": "Madrid","countryCode": "es","streetAddress": ["Some street 1"],"zip": "z1123ab"},"cin": 1234567411,"title": "Demo Person"}}'
]) AS json_text;
output:

Maybe we can use the JSON_EXTRACT function and look to see if the field exists (is not null). An example test might be:
SELECT CASE
WHEN JSON_EXTRACT(json_text, '$.corp') is not null then 'corp'
WHEN JSON_EXTRACT(json_text, '$.person') is not null then 'person'
WHEN JSON_EXTRACT(json_text, '$.sme') is not null then 'sme'
END AS cust_type
FROM UNNEST([
'{"corp": {"address": {"city": "London","countryCode": "gb","streetAddress": ["Fairlop road"],"zip": "e111bn"},"cin": 1234567420,"title": "Demo Corp"}}',
'{"person": {"address": {"city": "Madrid","countryCode": "es","streetAddress": ["Some street 1"],"zip": "z1123ab"},"cin": 1234567411,"title": "Demo Person"}}'
]) AS json_text;

Related

Select part of Json array based on a value from another column in PostgreSQL

I have a table where I store country information in a column and json data in another column.
I'd like to select only part of the json data, basically I'd like to find the country value inside the Json, and return the animals values from key "animals" that are the closest (and on the left side) to the country found in the json.
This is the table "myanimals":
Country
Metadata
US
{ "a": 1, "b": 2, "animals": ["dog","cat","mouse"], "region": {"country": "china"}, "animals": ["horse","bear","eagle"], "region": { "country": "us" } }
India
{ "a": 20, "b": 40, "animals": ["fish","cat","rat","hamster"], "region": {"country": "india"}, "animals": ["dog","rabbit","fox","fish"], "region": { "country": "poland" } }
Metadata is in json and NOT jsonb.
Using Postgres, I wanted to query so I'd end up with a new column, something like "animals_in_country", where the only information shown would be the values from key
"animals" which are the closest (and located on the left) to the matched country, as it follows:
Country
Metadata
animals_in_country
US
{ "a": 1, "b": 2, "animals": ["dog","cat","mouse"], "region": {"country": "china"}, "animals": ["horse","bear","eagle"], "region": { "country": "us" } }
["horse","bear","eagle"]
India
{ "a": 20, "b": 40, "animals": ["fish","cat","rat","hamster"], "region": {"country": "india"}, "animals": ["dog","rabbit","fox","fish"], "region": { "country": "poland" } }
["fish","cat","rat","hamster"]
Here's some pseudo code of what I am trying to achieve (please refer to the table shown above)
- Take the value in "Country", "US", and find the location of the same value in the JSON column
- location found, now search before this key 'country' for the key 'animals'
- Return whole array of values from 'animals'
- should be ["horse","bear","eagle"]
- shouldn't be ["dog","cat","mouse"] (as this one is part of "china" country in the JSON)
NOTE: Although this is dummy data, this is more or less the issue I am solving. And yes, the JSON is showing the same key more than once.
In case you are looking for the first animal in the array, this is your answer.
select a.data->'animals'->0 as first_animal from myanimals a
where a.data->'region'->>'country'='us'
or without filter (for all countries)
select a.data->'region'->>'country' as country,
a.data->'animals'->0 as first_animal from myanimals a

Extract data from a JSON file using python

Say if I have JSON entry as follows(The JSON file generated by fetching data from a Firebase DB):
[{"goal_savings": 0.0, "social_id": "", "score": 0, "country": "BR", "photo": "http://graph.facebook", "id": "", "plates": 3, "rcu": null, "name": "", "email": ".", "provider": "facebook", "phone": "", "savings": [], "privacyPolicyAccepted": true, "currentRole": "RoleType.PERSONAL", "empty_lives_date": null, "userId": "", "authentication_token": "-------", "onboard_status": "ONBOARDING_WIZARD", "fcmToken": ----------", "level": 1, "dni": "", "social_token": "", "lives": 10, "bills": [{"date": "2020-12-10", "role": "RoleType.PERSONAL", "name": "Supermercado", "category": "feeding", "periodicity": "PeriodicityType.NONE", "value": 100.0"}], "payments": [], "goals": [], "goalTransactions": [], "incomes": [], "achievements": [{"created_at":", "name": ""}]}]
How do I extract the content corresponding to 'value' which is present inside column 'bills' . Any way to do this ?
My python code is as follows. With this I was only able to get data within bills column. But I need only the entry corresponding to 'value' which is present inside bills.
import json
filedata = open('firebase-dataset.json','r')
data = json.load(filedata)
listoffields = [] # To produce it into a list with fields
for dic in data:
try:
listoffields.append(dic['bills']) # only non-essential bill categories.
except KeyError:
pass
print(listoffields)
The JSON you posted contains misplaced quotes.
I think you are trying to extract the value of 'value' column within bills.
try this
print(listoffields[0][0]['value'])
which will print you 100.0 as str. use float() to use it in calculations.
---edit---
Say the JSON you having contains many JSON objects separated by commas as..
[{ first-entry },{ second-entry },{ third.. }, ....and so on]
..and you want to find the value of each bill in the each JSON obj..
may be the code below will work.-
bill_value_list = [] # to store 'value' of each bill
for bill_list in listoffields:
bill_value_list.append(float(bill_list[0]['value'])) # blill_list[0] will contain complete bill dictionary.
print(bill_value_list)
print(sum(bill_value_list)) # do something usefull
Paste it after the code you posted.(no changes to your code .. since it always works :-) )

reshape jq nested file and make csv

I've been struggling with this one for the whole day which i want to turn to a csv.
It represents the officers attached to company whose number is "OC418979" in the UK Company House API.
I've already truncated the json to contain just 2 objects inside "items".
What I would like to get is a csv like this
OC418979, country_of_residence, officer_role, appointed_on
OC418979, country_of_residence, officer_role, appointed_on
OC418979, country_of_residence, officer_role, appointed_on
OC418979, country_of_residence, officer_role, appointed_on
...
There are 2 extra complication: there are 2 types of "officers", some are people, some are companies, so not all key in people are present in the other and viceversa. I'd like these entries to be 'null'. Second complication is those nested objects like "name" which contains a comma in it! or address, which contains several sub-objects (which I guess I could flatten in pandas tho).
{
"total_results": 13,
"resigned_count": 9,
"links": {
"self": "/company/OC418979/officers"
},
"items_per_page": 35,
"etag": "bc7955679916b089445c9dfb4bc597aa0daaf17d",
"kind": "officer-list",
"active_count": 4,
"inactive_count": 0,
"start_index": 0,
"items": [
{
"officer_role": "llp-designated-member",
"name": "BARRICK, David James",
"date_of_birth": {
"year": 1984,
"month": 1
},
"appointed_on": "2017-09-15",
"country_of_residence": "England",
"address": {
"country": "United Kingdom",
"address_line_1": "Old Gloucester Street",
"locality": "London",
"premises": "27",
"postal_code": "WC1N 3AX"
},
"links": {
"officer": {
"appointments": "/officers/d_PT9xVxze6rpzYwkN_6b7og9-k/appointments"
}
}
},
{
"links": {
"officer": {
"appointments": "/officers/M2Ndc7ZjpyrjzCXdFZyFsykJn-U/appointments"
}
},
"address": {
"locality": "Tadcaster",
"country": "United Kingdom",
"address_line_1": "Westgate",
"postal_code": "LS24 9AB",
"premises": "5a"
},
"identification": {
"legal_authority": "UK",
"identification_type": "non-eea",
"legal_form": "UK"
},
"name": "PREMIER DRIVER LIMITED",
"officer_role": "corporate-llp-designated-member",
"appointed_on": "2017-09-15"
}
]
}
What I've been doing is creating new json objects extracting the fields I needed like this:
{officer_address:.items[]?.address, appointed_on:.items[]?.appointed_on, country_of_residence:.items[]?.country_of_residence, officer_role:.items[]?.officer_role, officer_dob:items.date_of_birth, officer_nationality:.items[]?.nationality, officer_occupation:.items[]?.occupation}
But the query runs for hours - and I am sure there is a quicker way.
Right now I am trying this new approach - creating a json whose root is the company number and as argument a list of its officers.
{(.links.self | split("/")[2]): .items[]}
Using jq, it's easier to extract values from the top-level object that will be shared and generate the desired rows. You'll want to limit the amounts of times you go through the items to at most once.
$ jq -r '(.links.self | split("/")[2]) as $companyCode
| .items[]
| [ $companyCode, .country_of_residence, .officer_role, .appointed_on ]
| #csv
' input.json
Ok, you want to scan the list of officers, extract some fields from there if they are present and write that in csv format.
First part is to extract the data from the json. Assuming you loaded it is a data Python object, you have:
print(data['items'][0]['officer_role'], data['items'][0]['appointed_on'],
data['items'][0]['country_of_residence'])
gives:
llp-designated-member 2017-09-15 England
Time to put everything together with the csv module:
import csv
...
with open('output.csv', 'w', newline='') as fd:
wr = csv.writer(fd)
for officer in data['items']:
_ = wr.writerow(('OC418979',
officer.get('country_of_residence',''),
officer.get('officer_role', ''),
officer.get('appointed_on', '')
))
The get method on a dictionnary allows to use a default value (here the empty string) if the key is not present, and the csv module ensures that if a field contains a comma, it will be enclosed in quotation marks.
With your example input, it gives:
OC418979,England,llp-designated-member,2017-09-15
OC418979,,corporate-llp-designated-member,2017-09-15

PostgreSQL, get JSON object field based on a the value of a parallel attribute

Suppose we are dealing with a JSON object where there can be multiple child nodes with the same structure, and we want to get the value of attribute B,C,D,etc. where attribute A equals a specific value. Below is an example.
{
"addresses": [{
"type": "home",
"address": "123 fake street",
"zip": "24301"
}, {
"type": "work",
"address": "346 Main street",
"zip": "24352"
}, {
"type": "PO Box",
"address": "PO BOX 132, New York, NY",
"zip": "10001"
}, {
"type": "second",
"address": "1600 Pennsylvania Ave.",
"zip": "90210"
}]}
Is there any JSON operator in PostgreSQL where I can get the zip code, where the address type is "work" or "home"? I am looking at https://www.postgresql.org/docs/current/static/functions-json.html and not finding what I'm looking for.
You need to "unnest" (i.e. normalize) the data, then you can apply a WHERE condition on it:
select t.adr ->> 'zip', t.adr ->> 'address'
from the_table
cross join lateral jsonb_array_elements(the_column -> 'addresses') as t(adr)
where t.adr ->> 'type' in ('work', 'home');
Online example: http://rextester.com/TDB99535

Parsing JSON in SQL Server 2017 (Clearbit API call)

I'm pulling some data into a database on my local server with API calls via Clearbit provider. Everything was OK regarding parsing the data with SQL Server 2017 until I hit a bump.
I will go straight on the example for easier understanding.
This is the example of an API call output in JSON
{
"id": "384dfe0d-5bba-445e-a390-2d946dc84a12",
"name": "Honeywell",
"legalName": "Honeywell International Inc",
"domain": "honeywell.com",
"domainAliases": [
"honeywell.at",
"honeywell.it",
"evohome.info",
"wifithermostat.com",
"emsaviation.com",
"mytotalconnect.com",
"honeywell.nl",
"honeywell.co.za",
"honeywell.com.au",
"honeywell.ca",
"alliedsignal.com",
"emsdss.com",
"primusepic.com",
"alarmnet-me.com",
"lebow.com",
"honeywell.ie",
"honeywell.jp",
"honeywell.com.br",
"trendcontrol.co.uk",
"honeywellforjaguar.co.uk",
"aviaso.com",
"skyforce.co.uk",
"newenglandinstruments.com",
"honeywell.fi",
"alarmnet.com",
"skyconnect.com",
"skyforceuk.com",
"securitex.com",
"missionready.com",
"honeywellaerospace.com",
"formation.com",
"aclon.com",
"electrocorp.com",
"ultrak.com",
"satcom1.com",
"hsmpats.com",
"myaerospace.com",
"emsglobaltracking.com",
"fascocontrols.com",
"honeywellnow.com",
"bendixbrakes.com",
"elmwoodsensors.com",
"ovationselect.com",
"honeywellbusinessaviation.com",
"iflyaspire.com",
"btrinc.com",
"honeywellspecialtymaterials.com",
"magneticsensors.com",
"activeye.com",
"egarrett.com",
"novar-eds.com",
"aviaso.co.uk",
"chadwick-helmuth.com",
"datainstruments.com",
"lebowproducts.com",
"honeywell-produktkatalog.de",
"honeywellforjaguar.com",
"hobbs-corp.com",
"emsgt.com",
"honeywellaes.com",
"honeywellbuildingsolutions.com",
"satcom1.aero",
"honeywell-building-solutions.de",
"lifesafetydistribution.com",
"godirect.com",
"garrettbulletin.com",
"yourhomeexpert.com",
"aerospacetrading.com",
"sensorsystems.com",
"wifithermostat.info",
"honeywell-fachseminare.de",
"hobbscorporation.com",
"kcl.hu",
"honeywell.sk",
"esser.info",
"inertialsensor.com",
"sensotec.com",
"notifier.com",
"honeywellgreer.com",
"smartact.de",
"honeywellfire.com",
"iris-systems.com",
"honeywell.ru",
"lxei.com",
"thermalswitch.com",
"hightempsolutions.com",
"aubetech.com",
"honeywell-haustechnik.de",
"careersathoneywell.com",
"garrettbyhoneywell.com",
"honeywell.in",
"honeywell.cn",
"honeywell.com.mx",
"kcp.com",
"satamatics.com",
"myflite.com"
],
"site": {
"title": "Honeywell",
"h1": null,
"metaDescription": " We are blending products with software solutions to link people and businesses to the information they need to be more efficient, safer and connected. ",
"metaAuthor": null,
"phoneNumbers": [
"+1 877-271-8620",
"+1 800-633-3991",
"+1 877-841-2840",
"+1 480-353-3020",
"+1 973-455-3388",
"+1 973-204-9621",
"+32 2 728 20 45",
"+32 476 20 90 19",
"+44 7794 007289",
"+86 21 2219 6509"
],
"emailAddresses": [
"domains#honeywell.com",
"HoneywellPrivacy#honeywell.com",
"rob.ferris#honeywell.com",
"ilse.schouteden#honeywell.com",
"chris.martin2#honeywell.com",
"Anahi.Espinosa#honeywell.com",
"lydia.lu#honeywell.com",
"madhavi.jha#Honeywell.com",
"Steven.Brecken#Honeywell.com",
"Steve.Brecken#Honeywell.com",
"Eugene.Tan#Honeywell.com"
]
},
"category": {
"sector": "Consumer Discretionary",
"industryGroup": "Automobiles & Components",
"industry": "Automotive",
"subIndustry": "Automotive",
"sicCode": "3714",
"naicsCode": null
},
"tags": [
"Automotive",
"Enterprise",
"B2B",
"Electrical"
],
"description": " We are blending products with software solutions to link people and businesses to the information they need to be more efficient, safer and connected. ",
"foundedYear": 1936,
"location": "115 Tabor Rd, Morris Plains, NJ 07950, USA",
"timeZone": "America/New_York",
"utcOffset": -4,
"geo": {
"streetNumber": "115",
"streetName": "Tabor Road",
"subPremise": null,
"city": "Morris Plains",
"postalCode": "07950",
"state": "New Jersey",
"stateCode": "NJ",
"country": "United States",
"countryCode": "US",
"lat": 40.8358456,
"lng": -74.4771042
},
"logo": "https://logo.clearbit.com/honeywell.com",
"facebook": {
"handle": "293855263965203",
"likes": null
},
"linkedin": {
"handle": "company/honeywell"
},
"twitter": {
"handle": "HoneywellNow",
"id": "257492733",
"bio": "Please visit us over at #Honeywell.",
"followers": 2322,
"following": 1,
"location": "Morris Plains, NJ",
"site": "https:",
"avatar":
},
"crunchbase": {
"handle": "organization/honeywell"
},
"emailProvider": false,
"type": "public",
"ticker": "HON",
"phone": "+1 973-455-2000",
"metrics": {
"alexaUsRank": 6045,
"alexaGlobalRank": 18053,
"googleRank": null,
"employees": 51779,
"employeesRange": "1000+",
"marketCap": 102920000000,
"raised": null,
"annualRevenue": 39302000000,
"fiscalYearEnd": 12
},
"indexedAt": "2017-07-11T23:00:41.115Z",
"tech": [
"crazy_egg",
"google_analytics",
"google_tag_manager",
"asp_net",
"mouseflow",
"marketo",
"go_squared",
"microsoft_exchange_online",
"outlook",
"recaptcha"
],
"parent": {
"domain": null
},
"similarDomains": [
"abb-livingspace.com",
"alerton.com",
"gereports.com",
"honeywellprocess.com",
"honeywelluk.com",
"johnsoncontrols.com",
"jpinstruments.com",
"lenel.com",
"maxitrol.com",
"nucalgon.com",
"schneider-electric.us",
"siemens.com"
]
}
If you look at the example up here you will see "domainAliases": [...]
and that is the part of the JSON I still need to parse.
This is the parse query for SQL that I already have:
SELECT *
, JSON_VALUE(JSONData,'$.name') AS CompanyName
, JSON_VALUE(JSONData,'$.category.sector') AS CategorySector
, JSON_VALUE(JSONData, '$.category.industryGroup') AS CategoryIndustryGroup
, JSON_VALUE(JSONData, '$.category.industry') AS CategoryIndustry
, JSON_VALUE(JSONData, '$.category.subIndustry') AS CategorySubIndustry
, JSON_VALUE(JSONData, '$.category.sicCode') AS CategorySicCode
, JSON_VALUE(JSONData, '$.category.naicsCode') AS CategoryNaicsCode
, JSON_VALUE(JSONData, '$.metrics.employees') AS EmployeesNumber
, JSON_VALUE(JSONData, '$.metrics.employeesRange') AS EmployeesRange
, JSON_VALUE(JSONData, '$.metrics.marketCap') AS MarketCap
, JSON_VALUE(JSONData, '$.metrics.annualRevenue') AS AnnualRevenue
, JSON_VALUE(JSONData, '$.similarDomains') AS SimilarDomains
FROM Domains;
I want this data ("domainAliases") to be stored in other table as the data in the upper query (I know that the parse query I already have is only a SELECT query but I also have an UPDATE version of the query).
Here is an example picture of how the finished product in a new table, same database should look. The left column is called Company Name, the 2nd column is called Domain Aliases:
Now WHERE is the JSON data stored? I have it stored in a Column called JSONData, tablename: Domains and all this is in a database called Domainbank. JSONData datatype is nvarchar(max).
I need the data to be grouped by the name of the company and next to the company name there should be aliases domain just like the picture example shows. Now keep in mind that I will run this query for 10k+ JSONDatas and the new table that is going to be created will be super huge but as long as it is all grouped by the company name with all the alias domains it should be good. Some of the JSONDatas did not return the API call in the correct format because they either didn't find the data or something else went wrong, so If the query doesnt find anyting under the "domainAliases": [...] or if it doesn't even find the "domainAliases": [...] then I don't need the company to appear on the new table.
So short recap: let's make a new table (Let's call it AliasDomains), find the data under "domainAliases": [...] also pull the company name out JSON_VALUE(JSONData,'$.name') AS CompanyName, Store the data in the new table as the picture example higher in the post and then group by CompanyName.
So, from your post I am not completely clear on what your question is, but I assume it is how to write some SQL statement to accomplish the above?
First of all, I'd say you should not care of the GROUP BY in the insert, do GROUP BY when retrieving data out of the table.
Having said that you can quite easily accomplish what you want with a SELECT from the Domains table together with a CROSS APPLY OPENJSON statement, like so:
INSERT INTO AliasDomains(CompanyName, DomainAliases)
SELECT JSON_VALUE(JSONData, '$.name'), value
FROM Domains
CROSS APPLY OPENJSON (JSONData, '$.domainAliases')
EDIT: Should probably add that value in the above statement is returned from OPENJSON, e.g. it references the values of the (in this case domainAliases) path you want.
Hope this helps?!
Niels