A regex expression that can remove data from a json object - json

I'd like to be able to selectively remove elements from a json schema. Imagine a json object that contains a larger but similar array of users like this
[{
"users": [{
"firstName": "Nancy",
"socialSecurityNumber": "123-45-6789",
"sex": "Female",
"id": "1234",
"race": "Smith",
"lastName": "Logan"
}, {
"firstName": "Charles",
"socialSecurityNumber": "321-54-9876",
"sex": "Male",
"id": "3456",
"race": "White",
"lastName": "Clifford"
}],
I'd like to strip the socialSecurityNumber element from the json schema using a regex expression. What would a regex expression to remove
"socialSecurityNumber": "whatever value",
look like where the value of the data pair could be any string?
I cannot be certain of the position of the data pair and whether it would have a trailing comma.

Try replacing the following regular expression with empty:
"socialSecurityNumber": "(\d|\-)",
It can go wrong if this info is split in 2 lines, or if the SSN is the last user field, because there will be no comma after it.
Anyway, after the replacing operation, check if there are any string
"socialSecurityNumber"
to confirm this can be used. If there are still strings that weren't replaced, then you will need a JSON parser to correctly eliminate this information.

Related

Efficient way to reformat or split a string that has JSON records

I have this string that consists of a set of JSON entries concatenated together like the following
val docs = """
{"name": "Bilbo Baggins", "age": 50}{"name": "Gandalf", "age": 1000}{"name": "Thorin", "age": 195}{"name": "Balin", "age": 178}{"name": "Kíli", "age": 77} """
I'd like to convert it to something like this (basically a comma after each record):
{"name": "Bilbo Baggins", "age": 50}, {"name": "Gandalf", "age": 1000}, {"name": "Thorin", "age": 195}
So that I can save these records to my mongodb collection.
The way it is done on the SparkConnector documentation was by using split("[\r\n]+") which matches linebreaks. That approach is efficient but not what I want because I am using the circe Library to format JSON records and this library adds linebreaks to the Lists in these records and thus the string would be broken by the above split.
What I'm doing now is that when forming the docs string, I'm adding "###" between each two entries and later splitting them using this expression.
is this the right way to do this? or is there a smarter way? it is possible to create a regular expression that would match the brackets and their contents and be used to split the string and return the result the way I want it?
You can add the commas to make valid json using regex like this:
val s = """{"name": "Bilbo Baggins", "age": 50}{"name": "Gandalf", "age": 1000}"""
s.replaceAll("""(\{.*?\})""", """$1,""").dropRight(1)
Output:
res151: String = "{"name": "Bilbo Baggins", "age": 50},{"name": "Gandalf", "age": 1000}"
The \{ matches a literal bracket, the .*? does a non-greedy match of any characters before the next \} and the (...) is a capture group which we refer to in the replace string as $1.

json formats - which one to use?

What is the difference between these two JSON formats? Which format should I use?
[{
"employeeid": "12345",
"firstname": "joe",
"lastname": "smith",
"favoritefruit": "apple"
}, {
"employeeid": "45678",
"firstname": "paul",
"lastname": "johnson",
"favoritefruit": "orange"
}]
OR
[
["employeeid", "firstname", "lastname", "favoritefruit"],
["12345", "joe", "smith", "apple"],
["45678", "paul", "johnson", "orange"]
]
Definately first one. It will create array of employee object while second one will create array of array of objects which will be more difficult to parse in most of language.
It depends on the context.
The first is much easer to parse if you want to create employee objects to work with.
The second may be better if you need to work on the "raw" data only. Furthermore the second is much shorter. That's not important for small or medium datasets, but could be important for example if you need to transfer large sets of employee data.

Access deeper elements of a JSON using postgresql 9.4

I want to be able to access deeper elements stored in a json in the field json, stored in a postgresql database. For example, I would like to be able to access the elements that traverse the path states->events->time from the json provided below. Here is the postgreSQL query I'm using:
SELECT
data#>> '{userId}' as user,
data#>> '{region}' as region,
data#>>'{priorTimeSpentInApp}' as priotTimeSpentInApp,
data#>>'{userAttributes, "Total Friends"}' as totalFriends
from game_json
WHERE game_name LIKE 'myNewGame'
LIMIT 1000
and here is an example record from the json field
{
"region": "oh",
"deviceModel": "inHouseDevice",
"states": [
{
"events": [
{
"time": 1430247045.176,
"name": "Session Start",
"value": 0,
"parameters": {
"Balance": "40"
},
"info": ""
},
{
"time": 1430247293.501,
"name": "Mission1",
"value": 1,
"parameters": {
"Result": "Win ",
"Replay": "no",
"Attempt Number": "1"
},
"info": ""
}
]
}
],
"priorTimeSpentInApp": 28989.41467999999,
"country": "CA",
"city": "vancouver",
"isDeveloper": true,
"time": 1430247044.414,
"duration": 411.53,
"timezone": "America/Cleveland",
"priorSessions": 47,
"experiments": [],
"systemVersion": "3.8.1",
"appVersion": "14312",
"userId": "ef617d7ad4c6982e2cb7f6902801eb8a",
"isSession": true,
"firstRun": 1429572011.15,
"priorEvents": 69,
"userAttributes": {
"Total Friends": "0",
"Device Type": "Tablet",
"Social Connection": "None",
"Item Slots Owned": "12",
"Total Levels Played": "0",
"Retention Cohort": "Day 0",
"Player Progression": "0",
"Characters Owned": "1"
},
"deviceId": "ef617d7ad4c6982e2cb7f6902801eb8a"
}
That SQL query works, except that it doesn't give me any return values for totalFriends (e.g. data#>>'{userAttributes, "Total Friends"}' as totalFriends). I assume that part of the problem is that events falls within a square bracket (I don't know what that indicates in the json format) as opposed to a curly brace, but I'm also unable to extract values from the userAttributes key.
I would appreciate it if anyone could help me.
I'm sorry if this question has been asked elsewhere. I'm so new to postgresql and even json that I'm having trouble coming up with the proper terminology to find the answers to this (and related) questions.
You should definitely familiarize yourself with the basics of json
and json functions and operators in Postgres.
In the second source pay attention to the operators -> and ->>.
General rule: use -> to get a json object, ->> to get a json value as text.
Using these operators you can rewrite your query in the way which returns correct value of 'Total Friends':
select
data->>'userId' as user,
data->>'region' as region,
data->>'priorTimeSpentInApp' as priotTimeSpentInApp,
data->'userAttributes'->>'Total Friends' as totalFriends
from game_json
where game_name like 'myNewGame';
Json objects in square brackets are elements of a json array.
Json arrays may have many elements.
The elements are accessed by an index.
Json arrays are indexed from 0 (the first element of an array has an index 0).
Example:
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
-- returns "Mission1"
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
This did help me

Map Reduce to parse JSON data in hadoop 2.2

Hello I have a JSON in the following format.I need to parse this in the map function to get the gender information of all the records.
[
{
"SeasonTicket" : false,
"name" : "Vinson Foreman",
"gender" : "male",
"age" : 50,
"email" : "vinsonforeman#cyclonica.com",
"annualSalary" : "$98,501.00",
"id" : 0
},
{
"SeasonTicket": true,
"name": "Genevieve Compton",
"gender": "female",
"age": 28,
"email": "genevievecompton#cyclonica.com",
"annualSalary": "$46,881.00",
"id": 1
},
{
"SeasonTicket": false,
"name": "Christian Crawford",
"gender": "male",
"age": 53,
"email": "christiancrawford#cyclonica.com",
"annualSalary": "$53,488.00",
"id": 2
}
]
I have tried using JSONparser but am not able to get through the JSON structure.I have been advised to use JAQL and pig but cannot do so.
Any help would be appreciated.
What I understand is that you have a huge file with an array of JSONs. Of this, you need to read the same to a mapper and emit say <id : gender>. The challenge is that JSON falls across to multiple lines.
In this is the case, I would suggest you to change the default delimiter to "}" instead of "\n".
In this case, you will be able to get parts of the JSON into the map method as value. You can discard the key ie. byte offset and do slight re-fractor on the value like removing off unwanted [ ] or , and adding chars like "}" and then parse the remaining string.
This solution works because there is no nesting within JSON and } is a valid JSON end delimiter as per the given example.
For changing the default delimiter, just set the property textinputformat.record.delimiter to "}"
Please check out this example.
Also check this jira.

jqGrid JSON notation on objects

there!
I´ve one column in my jqGrid that is empty.
But i checked the object on chrome console and thats fine.
colModel definition
colModel:[
{name:'id',index:'id', width:55,editable:false,editoptions:{readonly:true,size:10},hidden:true},
{name:'firstName',index:'firstName', width:100,searchoptions: { sopt: ['eq', 'ne', 'cn']}},
{name:'lastName',index:'lastName', width:100,editable:true, editrules:{required:true}, editoptions:{size:10}},
{name:'books[0].nome',index:'books[0].nome', width:100,editable:true, editrules:{required:true}, editoptions:{size:10}},
{"formatter":"myfunction", formatoptions:{baseLinkUrl:'/demo/{firstName}|view-icon'}}
]
JSON response
{
"total": "10",
"page": "1",
"records": "3",
"rows": [
{
"id": 1,
"firstName": "John",
"lastName": "Smith",
"books": [{"nome": "HeadFirst"}]
},
{
"id": 2,
"firstName": "Jane",
"lastName": "Adams",
"books": [{"nome": "DalaiLama"}]
},
{
"id": 35,
"firstName": "Jeff",
"lastName": "Mayer",
"books": [{"nome": "Bobymarley"}]
}
]
}
chrome console inspect object
rowdata.books[0].nome
"HeadFirst"
Any one know where theres are possibles trick?
Tks!
You should use as the value of name property of colModel only the names which can be used as property name in JavaScript and as CSS id names. So the usage of name:'books[0].nome' is not good idea.
To solve your problem you can use jsonmap. For example you can use dotted name conversion:
{name: 'nome', jsonmap: 'books.0.nome', ...
In more complex cases you can use functions as the value of jsonmap. For example
{name: 'nome', jsonmap: function (item) {
return item.books[0].nome;
}, ...
You can find some more code examples about the usage of jsonmap in other old answers: here, here, here, here, here.
name is intended to be a unique name for the row, not a reference to a JSON object. From the jqGrid colModel options documentation:
Set the unique name in the grid for the column. This property is required. As well as other words used as property/event names, the reserved words (which cannot be used for names) include subgrid, cb and rn.
You can also observe how .name is used within grid.base.js - for example:
var nm = {},
...
nm = $t.p.colModel[i].name;
...
res[nm] = $.unformat.call($t,this,{rowId:ind.id, colModel:$t.p.colModel[i]},i);
Anyway, to get back to your question I think you will have better luck by passing down the book name directly - as strings and not objects - and referencing it by name as something like bookName.