Grouping nested document - json

I'm new to elasticsearch querying and probably this question is not so smart, but I would appreciate any help. Is there any idea how I can query only events in the following sample json user session (field "l"):
{
"dvs_t": 103492673,
"l": [
{
"e": "SessionInfo",
"p": {
"Device": "samsung GT-P6800",
"SessionNumber": "36"
},
"ts": 103279627
},
{
"e": "InAppPurchaseCompleted",
"p": {
"ItemID": "sdbundle_stars_10",
"TimePlayed_Total": "3 - 3.25 Hours"
},
"ts": 103318595
}
],
"osv": "4.1.2",
"request": "ANME",
"srv_ver": "0.2"
}
For instance, can I somehow
count the number of InAppPurchaseCompleted events in the session?
count the number of InAppPurchaseCompleted events in the sessions grouped by session parameter request or any other parameter?

Filter in Elasticsearch allow you to search (really quickly) on a specific set of data. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html.
If you only want to return count you can use search type count http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#count

Related

Using Power Query to extract data from nested arrays in JSON

I'm relatively new to Power Query, but I'm pulling in this basic structure of JSON from a web api
{
"report": "Cost History",
"dimensions": [
{
"time": [
{
"name": "2019-11",
"label": "2019-11",
…
},
{
"name": "2019-12",
"label": "2019-12",
…
},
{
"name": "2020-01",
"label": "2020-01",
…
},
…
]
},
{
"Category": [
{
"name": "category1",
"label": "Category 1",
…
},
{
"name": "category2",
"label": "Category 2",
…
},
…
]
}
],
"data": [
[
[
40419.6393798211
],
[
191.44
],
…
],
[
[
2299.652439184997
],
[
0.0
],
…
]
]
}
I actually have 112 categories and 13 "times". I figured out how to do multiple queries to turn the times into column headers and the categories into row labels (I think). But the data section is alluding me. Because each item is a list within a list I'm not sure how to expand it all out. Each object in the date array will have 112 numbers and there will be 13 objects. If that all makes sense.
So ultimately I want to make it look like
2019-11 2019-20 2020-01 ...
Category 1 40419 2299
Category 2 191 0
...
First time asking a question on here, so hopefully this all makes sense and is clear. Thanks in advance for any help!
i am also researching this exact thing and looking for a solution. In PQ, it displays nested arrays as a list and there is a function to extract values choosing a separating characterenter image description here
So this becomes, this
enter image description here
= Table.TransformColumns(#"Filtered Rows", {"aligned_to_ids", each Text.Combine(List.Transform(_, Text.From), ","), type text})
However the problem i'm trying to solve is when the nested json has multiple values like this: enter image description here
And when these LIST are extracted then an error message is caused, = Table.TransformColumns(#"Extracted Values1", {"collaborators", each Text.Combine(List.Transform(_, Text.From), ","), type text})
Expression.Error: We cannot convert a value of type Record to type Text.
Details:
Value=
id=15890
goal_id=323
role_id=15
Type=[Type]
It seems the multiple values are not handled and PQ does not recognise the underlying structure to enable the columns to be expanded.

replace "key" name in whole JSON python for bulk data in efficient way

Actually i am pushing data to other system but before pushing i have to change the "key" in the whole JSON. JSON may contain 200 or 10000 or 250000 data.
sample JSON:
{
"insert": "table",
"contacts": [
{
"testName": "testname",
"ContactID": 212121
},
{
"testName": "testname",
"ContactID": 2146354564
},
{
"testName": "testname",
"ContactID": 12312
},
{
"testName": "testname",
"ContactID": 211221
},
{
"testName": "testname",
"ContactID": 10218550
}
]
}
I need to change contacts array Keys. These contacts may be in bulk. So i need to work with this efficiently with minimal complexity.
The above JSON to be converted as below
{
"insert": "table",
"contacts": [
{
"name": "testname",
"phone": 212121
},
{
"name": "testname",
"phone": 2146354564
},
{
"name": "testname",
"phone": 12312
},
{
"name": "testname",
"phone": 211221
},
{
"name": "testname",
"phone": 10218550
}
]
}
here is my code trying by loop
ini_dict = request.data
contact_data = ini_dict['contacts']
for i in contact_data:
i['name'] = i.pop('testName')
print(contact_data)
Please suggest me how can i change the key names efficiently for bulk data. i mean for 50000 lists in contacts. "for loop" will be leading a performance issue. So please let me know the efficient way to achieve this
I dont know how fast you need it to be nor how you are choosing to store your json. One simple solution is just store it as a string and then replace all the instances of your attributes.
# Something like this using a jsonstring
jsonstring.replace("'testName':", "'name':")
jsonstring.replace("'ContactId':", "'phone':")
If you want to do this in bulk you, may need to create some batch process to be able to fetch multiple existing records and make changes at once. I have done this before with the java equivalent of https://pypi.org/project/JayDeBeApi/ but, that was more for modifying existing records in a database.

How to index multidimensional arrays in couchdb

I have a multidimensional array that I want to index with CouchDB (really using Cloudant). I have users which have a list of the teams that they belong to. I want to search to find every member of that team. So, get me all the User objects that have a team object with id 79d25d41d991890350af672e0b76faed. I tried to make a json index on "Teams.id", but it didn't work because it isn't a straight array but a multidimensional array.
User
{
"_id": "683be6c086381d3edc8905dc9e948da8",
"_rev": "238-963e54ab838935f82f54e834f501dd99",
"type": "Feature",
"Kind": "Profile",
"Email": "gc#gmail.com",
"FirstName": "George",
"LastName": "Castanza",
"Teams": [
{
"id": "79d25d41d991890350af672e0b76faed",
"name": "First Team",
"level": "123"
},
{
"id": "e500c1bf691b9cfc99f05634da80b6d1",
"name": "Second Team Name",
"level": ""
},
{
"id": "4645e8a4958421f7d843d9b34c4cd9fe",
"name": "Third Team Name",
"level": "123"
}
],
"LastTeam": "79d25d41d991890350af672e0b76faed"
}
This is a lot like my response at Cloudant Selector Query but here's the deal, applied to your question:
The easiest way to run this query is using "Cloudant Query" (or "Mango", as it's called in the forthcoming CouchDB 2.0 release) -- and not the traditional MapReduce view indexing system in CouchDB. (This blog covers the differences: https://cloudant.com/blog/mango-json-vs-text-indexes/ and this one is an overview: https://developer.ibm.com/clouddataservices/2015/11/24/cloudant-query-json-index-arrays/).
Here's what your CQ index should look like:
{
"index": {
"fields": [
{"name": "Teams.[].id", "type": "string"}
]
},
"type": "text"
}
And what the subsequent query looks like:
{
"selector": {
"Teams": {"$elemMatch": {"id": "79d25d41d991890350af672e0b76faed"}}
},
"fields": [
"_id",
"FirstName",
"LastName"
]
}
You can try it yourself in the "Query" section of the Cloudant dashboard or via curl with something like this:
curl -H "Content-Type: application/json" -X POST -d '{"selector":{"Teams":{"$elemMatch":{"id":"79d25d41d991890350af672e0b76faed"}}},"fields":["_id","FirstName","LastName"]}' https://broberg.cloudant.com/teams_test/_find
That database is world-readable, so you can see the sample documents I created in there here: https://broberg.cloudant.com/teams_test/_all_docs?include_docs=true
Dig the Seinfeld theme :D
You simply need to loop through the Teams array and emit a view entry for each of the teams.
function (doc) {
if(doc.Kind === "Profile"){
for (var i=0; i<doc.Teams.length; i++) {
var team = doc.Teams[i];
emit(team.id, [doc.FirstName, doc.LastName]);
}
}
}
You can then query for all profiles with a specific team id by keying on the team id like this
.../view?key="79d25d41d991890350af672e0b76faed"
giving
{"total_rows":7,"offset":2,"rows":[
{"id":"0d15041f43b43ae07e8faa737f00032c","key":"79d25d41d991890350af672e0b76faed","value":["Adam","Alpha"]},
{"id":"68779729be3610fd8b52b22574000ae8","key":"79d25d41d991890350af672e0b76faed","value":["Bob","Bravo"]},
{"id":"9f97f1565f03aebae9ca73e207001ee1","key":"79d25d41d991890350af672e0b76faed","value":["Chuck","Charlie"]}
]}
or you can include the actual profiles in the result by adding &include_docs=true to the query.

Access deeper elements of a JSON using postgresql 9.4

I want to be able to access deeper elements stored in a json in the field json, stored in a postgresql database. For example, I would like to be able to access the elements that traverse the path states->events->time from the json provided below. Here is the postgreSQL query I'm using:
SELECT
data#>> '{userId}' as user,
data#>> '{region}' as region,
data#>>'{priorTimeSpentInApp}' as priotTimeSpentInApp,
data#>>'{userAttributes, "Total Friends"}' as totalFriends
from game_json
WHERE game_name LIKE 'myNewGame'
LIMIT 1000
and here is an example record from the json field
{
"region": "oh",
"deviceModel": "inHouseDevice",
"states": [
{
"events": [
{
"time": 1430247045.176,
"name": "Session Start",
"value": 0,
"parameters": {
"Balance": "40"
},
"info": ""
},
{
"time": 1430247293.501,
"name": "Mission1",
"value": 1,
"parameters": {
"Result": "Win ",
"Replay": "no",
"Attempt Number": "1"
},
"info": ""
}
]
}
],
"priorTimeSpentInApp": 28989.41467999999,
"country": "CA",
"city": "vancouver",
"isDeveloper": true,
"time": 1430247044.414,
"duration": 411.53,
"timezone": "America/Cleveland",
"priorSessions": 47,
"experiments": [],
"systemVersion": "3.8.1",
"appVersion": "14312",
"userId": "ef617d7ad4c6982e2cb7f6902801eb8a",
"isSession": true,
"firstRun": 1429572011.15,
"priorEvents": 69,
"userAttributes": {
"Total Friends": "0",
"Device Type": "Tablet",
"Social Connection": "None",
"Item Slots Owned": "12",
"Total Levels Played": "0",
"Retention Cohort": "Day 0",
"Player Progression": "0",
"Characters Owned": "1"
},
"deviceId": "ef617d7ad4c6982e2cb7f6902801eb8a"
}
That SQL query works, except that it doesn't give me any return values for totalFriends (e.g. data#>>'{userAttributes, "Total Friends"}' as totalFriends). I assume that part of the problem is that events falls within a square bracket (I don't know what that indicates in the json format) as opposed to a curly brace, but I'm also unable to extract values from the userAttributes key.
I would appreciate it if anyone could help me.
I'm sorry if this question has been asked elsewhere. I'm so new to postgresql and even json that I'm having trouble coming up with the proper terminology to find the answers to this (and related) questions.
You should definitely familiarize yourself with the basics of json
and json functions and operators in Postgres.
In the second source pay attention to the operators -> and ->>.
General rule: use -> to get a json object, ->> to get a json value as text.
Using these operators you can rewrite your query in the way which returns correct value of 'Total Friends':
select
data->>'userId' as user,
data->>'region' as region,
data->>'priorTimeSpentInApp' as priotTimeSpentInApp,
data->'userAttributes'->>'Total Friends' as totalFriends
from game_json
where game_name like 'myNewGame';
Json objects in square brackets are elements of a json array.
Json arrays may have many elements.
The elements are accessed by an index.
Json arrays are indexed from 0 (the first element of an array has an index 0).
Example:
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
-- returns "Mission1"
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
This did help me

Handling Incredibly large JSON Document in CouchDB

I'm new to NoSql databases and I'm having a hard time figuring how to handle a very large JSON Document that could amount to over 20MB on my local drive. This structure will definitely increase over time and I worry about the speed of queries and having to search deep though the returned JSON object nest just to get a string out. My JSON is deeply nested like so for example.
{
"exams": {
"exam1": {
"year": {
"math": {
"questions": [
{
"question_text": "first question",
"options": [
"option1",
"option2",
"option3",
"option4",
"option5"
],
"answer": 1,
"explaination": "explain the answer"
},
{
"question_text": "second question",
"options": [
"option1",
"option2",
"option3",
"option4",
"option5"
],
"answer": 1,
"explaination": "explain the answer"
},
{
"question_text": "third question",
"options": [
"option1",
"option2",
"option3",
"option4",
"option5"
],
"answer": 1,
"explaination": "explain the answer"
}
]
},
"english": {same structure as above}
},
"1961": {}
},
"exam2": {},
"exam3": {},
"exam4": {}
}
}
In the main application, question objects are created and appended based on type of exam, year, and subject making the JSON document huge over time. How can I re-model this so as to avoid slow queries in the future?
Dominic is right. You need to start dividing the documents and storing them as separate documents.
The next question is how to recompose the document after it's been split.
Considering you're using Couch, I would recommend doing this at the application layer. A good starting point would be to create exam documents and store them in their own database. Then have a document (exams) in another database that has pointers to the exam documents.
You can retrieve the exams document and get exams one by one as needed. This could be especially useful with paging since most people will only want to see the most recent exams.