I want to design a JSON data model for daily transactions. The database is not yet decided yet but most probably firebase. One possible data model is:
{
"2022_01_User1":{
"monthStartingBalance":222,
"monthEndingBalance":444,
"expenses":100,
"income":200,
"transactions":[
{
"tId":"t1",
"date":"19/01/2022",
"amount":11,
"account":"ISP"
},
{
"tId":"t3",
"date":"21/01/2022",
"amount":15,
"account":"ISP"
},
{.....},
]
},
"2022_01_User2":{
"monthStartingBalance":222,
"monthEndingBalance":444,
"expenses":100,
"income":200,
"transactions":[
{
"tId":"t2",
"date":"20/01/2022",
"amount":11,
"account":"ISP"
},
{
"tId":"t4",
"date":"24/01/2022",
"amount":15,
"account":"ISP"
},
{.....},
]
}
}
Basically I want to make yyyy_mm_user as the primary key, which has various properties including array of transactions.
Another data model could be:
{
"2022_01":{
"user1":{
"monthStartingBalance":222,
"monthEndingBalance":444,
"expenses":100,
"income":200,
"transactions":[
{
"tId":"xxuxx",
"date":"20/01/2022",
"amount":11,
"account":"ISP"
},
{
"tId":"xxuxx1",
"date":"23/01/2022",
"amount":15,
"account":"ISP"
}
]
},
"user2":{
"monthStartingBalance":222,
"monthEndingBalance":444,
"expenses":100,
"income":200,
"transactions":[
{
"tId":"xxuxx",
"date":"21/01/2022",
"amount":11,
"account":"ISP"
},
{
"tId":"xxuxx1",
"date":"26/01/2022",
"amount":15,
"account":"ISP"
}
]
}
}
}
Not sure which to use or better to use? Or is there better data model?
Related
I'm trying to merge subdocument values into a collection, but I can't seem to find a way to do that. I have the following collection in MongoDB 4.0:
[{
"_id": "603f8c970f25800300a6c16e",
"Hash": "vkqsgIPmB4/am4KJkERghDmmCUXEZjrGQxdCF3Fll2brR0YxJSXeTg==",
"Components": [
{
"FieldA": "A-1",
"FieldB": "B-1"
},
{
"FieldA": "A-2",
"FieldB": "B-2"
}
]
},
{
"_id": "609f8c970f25800300a7c16e",
"Hash": "vkqsgIPmB4/am4KJkERghDmmCUXEZjrGQxdCF3Fll2brR0sddggdTs==",
"Components": [
{
"FieldA": "A-3",
"FieldB": "B-3"
},
{
"FieldA": "A-4",
"FieldB": "B-4"
}
]
}]
From this collection I would like to get the following result, where the id would be fed by the value of the main document, and the other fields would be fed by the subdocuments.
[
{
"_id":"603f8c970f25800300a6c16e",
"FieldA":"A-1",
"FieldB":"B-1"
},
{
"_id":"603f8c970f25800300a6c16e",
"FieldA":"A-2",
"FieldB":"B-2"
},
{
"_id":"609f8c970f25800300a7c16e",
"FieldA":"A-3",
"FieldB":"B-3"
},
{
"_id":"609f8c970f25800300a7c16e",
"FieldA":"A-4",
"FieldB":"B-4"
}
]
Thanks in advance!
You can do it like this:
$unwind - to unwind Components array
$project - to project data in the required format
db.collection.aggregate([
{
"$unwind": "$Components"
},
{
"$project": {
"FieldA": "$Components.FieldA",
"FieldB": "$Components.FieldB"
}
}
])
Working example
I have a rather nested JSON object below, and I am trying to calculate the user (ie 'profileId') with the most events (ie length of 'parameters' key.
I have the code below to get the length of the parameter, but I am trying to now have that calculation be correct for each record, as they way I have it set now would set it the same value for each record - I looked into pandas window functions https://pandas.pydata.org/docs/user_guide/window.html but am having trouble getting to the correct outcome.
response = response.json()
df = pd.json_normalize(response['items'])
df['calcfield'] = len(df["events"].iloc[0][0].get('parameters'))
the output of df['arrayfield'] is below:
[
{
"type":"auth",
"name":"activity",
"parameters":[
{
"name":"api_name",
"value":"admin"
},
{
"name":"method_name",
"value":"directory.users.list"
},
{
"name":"client_id",
"value":"722230783769-dsta4bi9fkom72qcu0t34aj3qpcoqloq.apps.googleusercontent.com"
},
{
"name":"num_response_bytes",
"intValue":"7158"
},
{
"name":"product_bucket",
"value":"GSUITE_ADMIN"
},
{
"name":"app_name",
"value":"Untitled project"
},
{
"name":"client_type",
"value":"WEB"
}
]
}
] }, {
"kind":"admin#reports#activity",
"id":{
"time":"2022-05-05T23:58:48.914Z",
"uniqueQualifier":"-4002873813067783265",
"applicationName":"token",
"customerId":"C02f6wppb"
},
"etag":"\"5T53xK7dpLei95RNoKZd9uz5Xb8LJpBJb72fi2HaNYM/9DTdB8t7uixvUbjo4LUEg53_gf0\"",
"actor":{
"email":"nancy.admin#hyenacapital.net",
"profileId":"100230688039070881323"
},
"ipAddress":"54.80.168.30",
"events":[
{
"type":"auth",
"name":"activity",
"parameters":[
{
"name":"api_name",
"value":"gmail"
},
{
"name":"method_name",
"value":"gmail.users.messages.list"
},
{
"name":"client_id",
"value":"927538837578.apps.googleusercontent.com"
},
{
"name":"num_response_bytes",
"intValue":"2"
},
{
"name":"product_bucket",
"value":"GMAIL"
},
{
"name":"app_name",
"value":"Zapier"
},
{
"name":"client_type",
"value":"WEB"
}
]
ORIGINAL JSON BLOB I READ IN
{
"kind":"admin#reports#activities",
"etag":"\"5g8\"",
"nextPageToken":"A:1651795128914034:-4002873813067783265:151219070090:C02f6wppb",
"items":[
{
"kind":"admin#reports#activity",
"id":{
"time":"2022-05-05T23:59:39.421Z",
"uniqueQualifier":"5526793068617678141",
"applicationName":"token",
"customerId":"cds"
},
"etag":"\"jkYcURYoi8\"",
"actor":{
"email":"blah#blah.net",
"profileId":"1323"
},
"ipAddress":"107.178.193.87",
"events":[
{
"type":"auth",
"name":"activity",
"parameters":[
{
"name":"api_name",
"value":"admin"
},
{
"name":"method_name",
"value":"directory.users.list"
},
{
"name":"client_id",
"value":"722230783769-dsta4bi9fkom72qcu0t34aj3qpcoqloq.apps.googleusercontent.com"
},
{
"name":"num_response_bytes",
"intValue":"7158"
},
{
"name":"product_bucket",
"value":"GSUITE_ADMIN"
},
{
"name":"app_name",
"value":"Untitled project"
},
{
"name":"client_type",
"value":"WEB"
}
]
}
]
},
{
"kind":"admin#reports#activity",
"id":{
"time":"2022-05-05T23:58:48.914Z",
"uniqueQualifier":"-4002873813067783265",
"applicationName":"token",
"customerId":"df"
},
"etag":"\"5T53xK7dpLei95RNoKZd9uz5Xb8LJpBJb72fi2HaNYM/9DTdB8t7uixvUbjo4LUEg53_gf0\"",
"actor":{
"email":"blah.blah#bebe.net",
"profileId":"1324"
},
"ipAddress":"54.80.168.30",
"events":[
{
"type":"auth",
"name":"activity",
"parameters":[
{
"name":"api_name",
"value":"gmail"
},
{
"name":"method_name",
"value":"gmail.users.messages.list"
},
{
"name":"client_id",
"value":"927538837578.apps.googleusercontent.com"
},
{
"name":"num_response_bytes",
"intValue":"2"
},
{
"name":"product_bucket",
"value":"GMAIL"
},
{
"name":"client_type",
"value":"WEB"
}
]
}
]
}
]
}
Use:
df.groupby('actor.profileId')['events'].apply(lambda x: [len(x.iloc[i][0]['parameters']) for i in range(len(x))])
which returns the list of each profileid count of parameters. Output and the sample data:
actor.profileId
1323 [7]
1324 [7]
Name: events, dtype: object
It's not entirely clear what you asking and df['arrayfield'] isn't in your example provided. However, if you look at the events column after json_normalize, you can use the following line to pull out the length of each parameters key. The blob you gave as an example was set to response...
df = pd.json_normalize(response['items'])
df['calcfield'] = df['events'].str[0].str.get('parameters').str.len()
Becauase each parameters key has 7 elements, it's tough to say this is what you really want.
In my MongoDB (export from JSON file) I have database "dab" with structure like this:
id:"1"
datetime:"2020-05-08 5:09:56"
name:"namea"
lat:55.826738
lon:45.0423412
analysis:"[{"0":0.36965591924860347},{"5":0.10391287134268598},{"10":0.086884394..."
I'm using that db for spark analysis via MongoDB-Spark Connector.
My problem is field "analysis" - I need average result for all values from every interval ("0", "5", "10", ..., "1000"), so I have to sum 0.36965591924860347 + 0.10391287134268598 + 0.086884394 + ... and divide by number of intervals (I have 200 intervals in every column), and finally multiply the result by 100.
My solution would be this one:
db.collection.aggregate([
{
$set: {
analysis: {
$map: {
input: "$analysis",
in: { $objectToArray: "$$this" }
}
}
}
},
{
$set: {
analysis: {
$map: {
input: "$analysis",
in: { $first: "$$this.v" }
}
}
}
},
{ $set: { average: { $multiply: [ { $avg: "$analysis" }, 100 ] } } }
])
Mongo playground
You can use $reduce on that array,sum the values,and then divide with the number of elements and then multiply with 100
db.collection.aggregate([
{
"$addFields": {
"average": {
"$multiply": [
{
"$divide": [
{
"$reduce": {
"input": "$analysis",
"initialValue": 0,
"in": {
"$let": {
"vars": {
"sum": "$$value",
"data": "$$this"
},
"in": {
"$add": [
"$$sum",
{
"$arrayElemAt": [
{
"$arrayElemAt": [
{
"$map": {
"input": {
"$objectToArray": "$$data"
},
"as": "m",
"in": [
"$$m.k",
"$$m.v"
]
}
},
0
]
},
1
]
}
]
}
}
}
}
},
{
"$size": "$analysis"
}
]
},
100
]
}
}
}
])
You can test the code here
But this code has 1 problem, you save data in documents, and MongoDB
doesn't have a function like get(document,$$k), the new MongoDB v5.0 has a $getField but still accepts only constants no variables.
I mean we cant do in your case getField(doc,"5").
So we have the cost of converting each document to an array.
Here is the sample JSON
Sample JSON:
[
{
"_id": "123456789",
"YEAR": "2019",
"VERSION": "2019.Version",
"QUESTION_GROUPS": [
{
"QUESTIONS": [
{
"QUESTION_NAME": "STATE_CODE",
"QUESTION_VALUE": "MH"
},
{
"QUESTION_NAME": "COUNTY_NAME",
"QUESTION_VALUE": "IN"
}
]
},
{
"QUESTIONS": [
{
"QUESTION_NAME": "STATE_CODE",
"QUESTION_VALUE": "UP"
},
{
"QUESTION_NAME": "COUNTY_NAME",
"QUESTION_VALUE": "IN"
}
]
}
]
}
]
Query that am using :
db.collection.find({},
{
"QUESTION_GROUPS.QUESTIONS.QUESTION_NAME": "STATE_CODE"
})
My requirement is retrive all QUESTION_VALUE whose QUESTION_NAME is equals to STATE_CODE.
Thanks in Advance.
If I get you well, What you are trying to do is something like:
db.collection.find(
{
"QUESTION_GROUPS.QUESTIONS.QUESTION_NAME": "STATE_CODE"
},
{
"QUESTION_GROUPS.QUESTIONS.QUESTION_VALUE": 1
})
Attention: you will get ALL the "QUESTION_VALUE" for ANY document which has a QUESTION_GROUPS.QUESTIONS.QUESTION_NAME with that value.
Attention 2: You will get also the _Id. It is by default.
In case you would like to skip those issues, you may need to use Aggregations, and unwind the "QUESTION_GROUPS"-> "QUESTIONS". This way you can skip both the irrelevant results, and the _id field.
It sounds like you want to unwind the arrays and grab only the question values back
Try this
db.collection.aggregate([
{
$unwind: "$QUESTION_GROUPS"
},
{
$unwind: "$QUESTION_GROUPS.QUESTIONS"
},
{
$match: {
"QUESTION_GROUPS.QUESTIONS.QUESTION_NAME": "STATE_CODE"
}
},
{
$project: {
"QUESTION_GROUPS.QUESTIONS.QUESTION_VALUE": 1
}
}
])
From the perspective of a consumer, is there any value in abstracting resource attributes to make the fields self-describing? Or should the documentation handle it.
The idea is that each attribute will be wrapped in a more complex object which will provide fieldId, fieldType, and the value. Making each field more descriptive.
In addition, the web service would include another endpoint to further describe each field.
So, instead of the following:
{
"id":123,
"type":"person",
"attributes":{
"name":"John Smith",
"dateOfBirth":"2000-01-01",
"ssn":123456789
}
}
The json would look like this:
{
"id":123,
"type":"person",
"attributes":[
{
"fieldId":"name",
"dataType":"string",
"value":"John Smith"
},
{
"fieldId":"dateOfBirth",
"dataType":"date",
"value":"2000-01-01"
},
{
"fieldId":"ssn",
"dataType":"integer",
"value":123456789
}
],
"relationships":{
"dataType":{
"links":{
"related":{
"href":"http://acme.com/ws/dataTypes/"
}
},
"data":[
{
"id":"string",
"type":"dataType"
},
{
"id":"date",
"type":"dataType"
},
{
"id":"integer",
"type":"dataType"
}
]
},
"field":{
"links":{
"related":{
"href":"http://acme.com/ws/fields/"
}
},
"data":[
{
"id":"name",
"type":"field"
},
{
"id":"dateOfBirth",
"type":"field"
},
{
"id":"ssn",
"type":"field"
}
]
}
}
}
And then a dataType resource linked to would give some description and/or format:
{
"id":"ssn",
"type":"field",
"attributes":{
"valueType":"string",
"description":"Social security in the xxx-xx-xxxx format."
},
"links":{
"self":{
"href":"http://acme.com/ws/fields/ssn",
"meta":{
"httpMethod":"GET"
}
}
}
}
{
"id":"date",
"type":"dataType",
"attributes":{
"valueType":"string",
"description":"yyyy-MM-dd"
},
"links":{
"self":{
"href":"http://acme.com/ws/dataTypes/date",
"meta":{
"httpMethod":"GET"
}
}
}
}
To answer this From the perspective of a consumer, is there any value in abstracting resource attributes to make the fields self-describing? Or should the documentation handle it.
Based on experience and evaluating multiple api's the api should only send required data. There is no point handling description in response that needs to be taken care by documentation.
Plus consider the extra amount of data you are sending just to describe the fields
In addition frontend (say javascript) would need to parse the object, save time by sending only the required data
consider the bandwidth taken by this
{
"id":123,
"type":"person",
"attributes":{
"name":"John Smith",
"dateOfBirth":"2000-01-01",
"ssn":123456789
}
}
as compared to this huge data
{
"id":123,
"type":"person",
"attributes":[
{
"fieldId":"name",
"dataType":"string",
"value":"John Smith"
},
{
"fieldId":"dateOfBirth",
"dataType":"date",
"value":"2000-01-01"
},
{
"fieldId":"ssn",
"dataType":"integer",
"value":123456789
}
],
"relationships":{
"dataType":{
"links":{
"related":{
"href":"http://acme.com/ws/dataTypes/"
}
},
"data":[
{
"id":"string",
"type":"dataType"
},
{
"id":"date",
"type":"dataType"
},
{
"id":"integer",
"type":"dataType"
}
]
},
"field":{
"links":{
"related":{
"href":"http://acme.com/ws/fields/"
}
},
"data":[
{
"id":"name",
"type":"field"
},
{
"id":"dateOfBirth",
"type":"field"
},
{
"id":"ssn",
"type":"field"
}
]
}
}
}
From consumer perspective provide them only the required data in response and description in documentation.
And don't make separate call for providing more details, it will be very hard to maintain if you ever change version