Delete/ignore unwanted elements in json - json

I want to delete/ignore the elements in the following json record:
{"_scroll_id":"==","timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":6908915,"max_score":null,"hits":[{"_index":"abc_v1","_type":"composite_request_response_v1","_id":"123","_score":1.0,"_source":{"response":{"testResults":{"docsisResults":{"devices":[{"upstreamSection":{"upstreams":[]},"fluxSection":{"fluxInfo":[{}]}}],"events":[]},"mocaResults":{"statuses":[]}}}},"sort":[null,1.0]}]}},
I have the records in the above format. I wish to delete the highlighted part of the record. Can someone guide me of ways I can accomplish that. Are there anyways I can achieve that using hive/pig/linux/python?

There is the JSON SerDe in Hive, see this: https://cwiki.apache.org/confluence/display/Hive/Json+SerDe
So you can define only columns that you need in table definition, put your file in the table location and then select only defined columns. Alternatively you can pre-process/transform your files before loading them using Java+ Jackson (library to serialize or map Java objects to JSON and vice versa), this will give you maximum flexibility thought this is not so simple as using JSON SerDe.

Related

Spark from_avro 2nd argument is a constant string, any way to obtain schema string from some column of each record?

suppose we are developing an application that pulls Avro records from a source
stream (e.g. Kafka/Kinesis/etc), parses them into JSON, then further processes that
JSON with additional transformations. Further assume these records can have a
varying schema (which we can look up and fetch from a registry).
We would like to use Spark's built in from_avro function, But it is pretty clear that
Spark from_avro wants you to hard code a >Fixed< schema into your code. It doesn't seem
to allow the schema to vary row by incoming row.
That sort of makes sense if you are parsing the Avro to Internal row format.. One would need
a consistent structure for the dataframe. But what if we wanted something like
from_avro which grabbed the bytes from some column in the row and also grabbed the string
representation of the Avro schema from some other column in the row, and then parsed that Avro
into a JSON string.
Does such built-in method exist? Or is such functionality available in a 3rd party library ?
Thanks !

neo4j insert complicated json with relationship between nodes for same

This is going to be little complex
I am trying to save a json with nested array structure. I have the json like below which I am trying to save
JSON LINK
Is there a possibility to save the above json with cypher query, because previously I tried with py2neo library for pyhton, which is based on model definition, Therefore the above json with nested structure will be going to have a dynamic keys a bit.
What I actually tried is, I'm leaving it below
query = '''
CREATE (part:Part PARTJSON)
MERGE (part) - [:LINKED_TO] - (general:General GENERALJSON)
MERGE (general) - [:LINKED_TO] - (bom:Bom BOMJSON )
MERGE (general) - [:LINKED_TO] - (generaldata:GeneralData GENERALDATAJSON )
.......
'''
Is there a possibility to write a cypher query to save it in one flash.
If so, can help with possible ideas so that it will be useful for neo4j users with roadblocks.
Thanks in advance.

Python: Dump JSON Data Following Custom Format

I'm working on some Python code for my local billiard hall and I'm running into problems with JSON encoding. When I dump my data into a file I obviously get all the data in a single line. However, I want my data to be dumped into the file following the format that I want. For example (Had to do picture to get point across),
My custom JSON format
. I've looked up questions on custom JSONEncoders but it seems they all have to do with datatypes that aren't JSON serializable. I never found a solution for my specific need which is having everything laid out in the manner that I want. Basically, I want all of the list elements to on a separate row but all of the dict items to be in the same row. Do I need to write my own custom encoder or is there some other approach I need to take? Thanks!

Non-technical terms on Elasticsearch, Logstash and Kibana

I have a doubt. I do know that Logstash allows us to input csv/log files and filter it using separators and columns. And it will output into elasticsearch for it to be used by Kibana. However, after writing the conf file, do I need to specify index pattern by using the command:
CURL -XPUT 'http://localhost:5601/test' d
Because I do know that when you have a JSON file, you will have to define the mapping etc. Do I need to do this step for csv files and other non json files? Sorry for asking, I need to clear my doubt.
When you insert documents into a new elasticsearch index, a mapping is created for you. This may not be a good thing, as it's based on the initial value of each field. Imagine a field that normally contains a string, but the initial document contains an integer - now your mapping is wrong. This is a good case for creating a mapping.
If you insert documents through logstash into an index named logstash-YYYY-MM-DD (the default), logstash will apply its own mapping. It will use any pattern hints you gave it in grok{}, e.g.:
%{NUMBER:bytes:int}
and it will also make a "raw" (not analyzed) version of each string, which you can access as myField.raw. This may also not be what you want, but you can make your own mapping and provide it as an argument in the elasticsearch{} output stanza.
You can also make templates, which elasticsearch will apply when an index pattern matches the template definition.
So, you only need to create a mapping if you don't like the default behaviors of elasticsearch or logstash.
Hope that helps.

Insert a Coldfusion struct into a database

If I wanted to save a contact form submission to the database, how can I insert the form scope in as the submission? It's been some time since I used Coldfusion.
The contact forms vary depending on what part of the site it was submitted from, so it needs to scale and handle a form with 5 fields or one with 10 fields. I just want to store the data in a blob table.
Most space efficient way and least complicated to turn back into original shape is using serializeJSON. After that, you can use something like key:value|key:value, or XML representation of your struct.
Cfwddx is also an alternative.
I don't know that there is a way to store a native structure into a database, but have you thought about using JSON to represent your object as key-pair values and then parsing it into a native structure after retrieving it from the database?
There are tags/functions out there that will help you with the encoding and decoding into JSON:
cfJSON Tag
CF8 Serialize JSON
CF8 Deserialize JSON
If you can't normalize the form fields into proper table(s), you can try storing them:
in XML (SQL Server supports XML pretty well), or
in JSON (in a plain varchar field), or
ObjectLoad() & ObjectSave() (CF9 only) to store as blob.
IIRC there are ways to get object load/save functionality in pre-CF9 by tapping into Java. http://www.riaforge.org/ or http://cflib.org/ might have it.