I create JSON like this to extract any table (name "randomly" decided at runtime, its name is in variable iv_table_name):
FIELD-SYMBOLS <itab> TYPE STANDARD TABLE.
DATA ref_itab TYPE REF TO data.
DATA(iv_table_name) = 'SCARR'.
CREATE DATA ref_itab TYPE STANDARD TABLE OF (iv_table_name).
ASSIGN ref_itab->* TO <itab>.
SELECT *
INTO TABLE <itab>
FROM (iv_table_name).
DATA results_json TYPE TABLE OF string.
DATA sub_json TYPE string.
DATA(lo_json_writer) = cl_sxml_string_writer=>create( type = if_sxml=>co_xt_json ).
CALL TRANSFORMATION id
SOURCE result = <itab>
RESULT XML lo_json_writer.
cl_abap_conv_in_ce=>create( )->convert(
EXPORTING
input = lo_json_writer->get_output( )
IMPORTING
data = sub_json ).
The result variable sub_json looks like this:
{"RESULT":
[
{"MANDT":"220","AUFNR":"0000012", ...},
{"MANDT":"220","AUFNR":"0000013", ...},
...
]
}
Is there a way to avoid the surrounding dictionary and get the result like this?
[
{"MANDT":"220","AUFNR":"0000012", ...},
{"MANDT":"220","AUFNR":"0000013", ...},
...
]
Background:
I used this:
sub_json = /ui2/cl_json=>serialize( data = <lt_result> pretty_name = /ui2/cl_json=>pretty_mode-low_case ).
But the performance of /ui2/cl_json=>serialize( ) is not good.
If you really want to use it just as a tool for extracting table records then you could write your own ID transformation in STRANS. It could look like that, let us name it Z_JSON_TABLE_CONTENTS (create it with type XSLT):
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:sap="http://www.sap.com/sapxsl"
>
<xsl:output method="text" encoding="UTF-8" />
<xsl:strip-space elements="*"/>
<xsl:template match="RESULT">
[
<xsl:for-each select="*">
{
<xsl:for-each select="*">
"<xsl:value-of select="local-name()" />": "<xsl:value-of select="text()" />"<xsl:if test="position() != last()">,</xsl:if>
</xsl:for-each>
}<xsl:if test="position() != last()">,</xsl:if>
</xsl:for-each>
]
</xsl:template>
</xsl:transform>
Then you could use it like that.
REPORT ZZZ.
FIELD-SYMBOLS <itab> TYPE STANDARD TABLE.
DATA ref_itab TYPE REF TO data.
DATA(iv_table_name) = 'SCARR'.
CREATE DATA ref_itab TYPE STANDARD TABLE OF (iv_table_name).
ASSIGN ref_itab->* TO <itab>.
SELECT *
INTO TABLE <itab>
FROM (iv_table_name).
DATA results_json TYPE TABLE OF string.
DATA sub_json TYPE string.
DATA g_string TYPE string.
DATA(g_document) = cl_ixml=>create( )->create_document( ).
DATA(g_ref_stream_factory) = cl_ixml=>create( )->create_stream_factory( ).
DATA(g_ostream) = g_ref_stream_factory->create_ostream_cstring( g_string ).
CALL TRANSFORMATION Z_JSON_TABLE_CONTENTS
SOURCE result = <itab>
RESULT XML g_ostream.
DATA(g_json_parser) = new /ui5/cl_json_parser( ).
g_json_parser->parse( g_string ).
I've got no answer whether it's possible to omit the initial "RESULT" tag in full sXML, but my opinion is NO.
Now, there's the solution with the KISS principle :
REPLACE ALL OCCURRENCES OF REGEX '^\{"RESULT":|\}$' IN sub_json WITH ``.
There's also this other writing (slightly slower):
sub_json = replace( val = sub_json regex = '^\{"RESULT":|\}$' with = `` occ = 0 ).
ADDENDUM about performance:
I measured that for a string of 880K characters, the following code with the exact number of positions to remove (10 leading characters and 1 trailing character) is 6 times faster than regex (could vary based on version of ABAP kernel), but maybe it won't be noticeable compared to the rest of the program:
SHIFT sub_json LEFT BY 10 PLACES CIRCULAR.
REPLACE SECTION OFFSET strlen( sub_json ) - 11 OF sub_json WITH ``.
Just a bit of manual work and voila!
DATA(writer) = CAST if_sxml_writer( cl_sxml_string_writer=>create( type = if_sxml=>co_xt_json ) ).
DATA(components) =
CAST cl_abap_structdescr( cl_abap_typedescr=>describe_by_name( iv_table_name ) )->components.
writer->open_element( name = 'object' ).
LOOP AT <itab> ASSIGNING FIELD-SYMBOL(<line>).
LOOP AT components ASSIGNING FIELD-SYMBOL(<fs_comp>).
ASSIGN COMPONENT <fs_comp>-name OF STRUCTURE <line> TO FIELD-SYMBOL(<fs_val>).
writer->open_element( name = 'str' ).
writer->write_attribute( name = 'name' value = CONV string( <fs_comp>-name ) ).
writer->write_value( CONV string( <fs_val> ) ).
writer->close_element( ).
ENDLOOP.
ENDLOOP.
writer->close_element( ).
DATA(xml_json) = CAST cl_sxml_string_writer( writer )->get_output( ).
sub_json = cl_abap_codepage=>convert_from( source = xml_json codepage = `UTF-8` ).
No surrounding list and no dictionary. If you wanna each line in separate dictionary it is easily adjustable.
If you use ID call transformation, then what ever node you give at transformation that node will be added by default. We cannot skip this but you can remove following way..
Replace: Using Regex or Direct word with Replace First Occurrence statement and next last closing brace }. The way you did.
FIND: You can simple use this below statement
FIND REGEX '(\[.*\])' in sub_json SUBMATCHES sub_json.
Related
I am trying to parse out the language for different profiles that are stored in a json field named "data". They are stored in their own array in the json field like:
"languages": ["EN", "BN", "HI"]
I can call the whole array by using:
data->>'languages' as languages
but I would like to split it out into
language1 = "EN"
language2 = "BN"
language3 = "HI"
I think if possible the best solution would be to return the whole language array but exclude "EN" from it, but I'm not sure if that is possible. ex.
"languages": ["BN", "HI"]
You can use the - operator to remove the EN key:
select (data -> 'languages') - 'EN' as languages
from the_table;
Online example
As a simplified example, consider this table with two fields. One is a string and the other is XML.
SELECT TOP (1) [Source]
, OrderParameter
FROM [Rops].[dbo].[PreOrder]
Source="MediaConversions"
OrderParameter="<?xml version="1.0" encoding="utf-16"?>"
Now I want to query the table and have the results as json, but also have the XML converted as json in one go.
SELECT TOP (1) [Source]
, OrderParameter
FROM [Rops].[dbo].[PreOrder]
for json path;
results in
[{"Source":"MediaConversions","OrderParameter":"<ParameterList
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" />"}]
But I want it to be converted into:
[{"Source":"MediaConversions","OrderParameter":{ "ParameterList": [
"x": 1, "y": 10] }
]
How to add "for json" to have the xml converted?
SELECT TOP (1) [Source]
, select OrderParameter for json path????
FROM [Rops].[dbo].[PreOrder]
for json path;
It looks like you want to pull out the inner text of the ParameterList node inside the XML. You can use .value and XQuery for this:
SELECT TOP (1) [Source]
, OrderParameter = (
SELECT
x = x.PL.value('(x/text())[1]','int'),
y = x.PL.value('(y/text())[1]','int')
FROM (VALUES( CAST(OrderParameter AS xml) )) v(OrderParameter)
CROSS APPLY v.OrderParameter.nodes('ParameterList') x(PL)
FOR JSON PATH, ROOT('ParameterList')
)
FROM [Rops].[dbo].[PreOrder]
FOR JSON PATH;
I want to read arbitrary ABAP data into an iXML document object which contains the JSON-XML representation of these data.
The only way I see is a double application of the id transformation which is not very efficient:
data(lo_aux1) = cl_sxml_string_writer=>create( if_sxml=>co_xt_json ).
call transformation id
source data = ls_some_abap_data
result xml lo_aux1.
data(lv_aux2) = lo_aux1->get_output( ).
data(lo_result) = cl_ixml=>create( )->create_document( ).
call transformation id
source xml lv_aux2
result xml lo_result.
Now lo_result is an iXML DOM representation of the ABAP data in the JSON-XML format, as required. Is it possible to obtain it in a more direct way?
Note: I am not interested in result objects of the sXML family, as I want to manipulate / extend the resulting JSON-XML document with the usual XML DOM methods, which is impossible for an sXML writer object (sXML writers are so simple they can only write everything they have into an output object, but don't allow editing of parts of the object that they already contain).
I am sitting at a proxy and want to enrich the incoming JSON payload by some ABAP data, before passing it to the endpoint. Strategy: parse the incoming JSON into an JSON-XML doc, read the (complex) ABAP data into a second XML doc, then add XML subtrees of the second to the first before finally producing the result JSON from the first JSON-XML doc
I don't understand the need of iXML at all here. Converting complex ABAP structure into XML to merge it with JSON is redundant here. Let's assume you received some JSON data from Web-Service:
{
"main":{
"PASSENGERS":[
{
"NAME":"Horst",
"TITLE":"Herr",
"AGE":30
},
{
"NAME":"Jutta",
"TITLE":"Frau",
"AGE":35
},
{
"NAME":"Ingo",
"TITLE":"Herr",
"AGE":31
}
]
}
}
And you want enrich each passenger data with flight data from SFLIGHT table. You can manipulate the nodes and attributes with cl_sxml_string_writer just like this:
DATA(lv_json) = CONV string( '{"main": {"PASSENGERS":[ {"NAME":"Horst","TITLE":"Herr","AGE":30}, {"NAME":"Jutta","TITLE":"Frau","AGE":35}, {"NAME":"Ingo","TITLE":"Herr","AGE":31} ]}}' ).
DATA open_element TYPE REF TO if_sxml_open_element.
DATA value TYPE REF TO if_sxml_value_node.
DATA(o_json) = cl_abap_codepage=>convert_to( lv_json ).
DATA(reader) = cl_sxml_string_reader=>create( o_json ).
SELECT DISTINCT connid, fldate, planetype FROM sflight INTO TABLE #DATA(lt_flight).
DATA(xml) = cl_abap_codepage=>convert_to( lv_json ).
DATA(out) = cl_demo_output=>new( )->begin_section( 'Original JSON' )->write_xml( xml ).
DATA(writer) = CAST if_sxml_writer( cl_sxml_string_writer=>create( ) ).
open_element = writer->new_open_element( name = 'flights' nsuri = reader->nsuri ).
writer->write_node( open_element ).
DATA(i) = 1.
DO.
DATA(node) = reader->read_next_node( ).
IF node IS INITIAL.
EXIT.
ENDIF.
IF node IS INSTANCE OF if_sxml_value_node.
DATA(value_node) = CAST if_sxml_value_node( node ).
value_node->set_value( to_upper( value_node->get_value( ) ) ).
ENDIF.
writer->write_node( node ).
IF node->type = if_sxml_node=>co_nt_element_open.
DATA(op) = CAST if_sxml_open_element( node ).
CHECK op->qname-name = 'object' AND op->get_attributes( ) IS INITIAL.
open_element = writer->new_open_element( name = 'flight' nsuri = reader->nsuri ).
open_element->set_attribute( name = 'FLIGHT_DATE'
value = | { lt_flight[ i ]-fldate } | ).
open_element->set_attribute( name = 'PLANE_TYPE'
value = | { lt_flight[ i ]-planetype } | ).
writer->write_node( open_element ).
value = writer->new_value( ).
value->set_value( | { lt_flight[ i ]-connid } | ).
writer->write_node( value ).
writer->write_node( writer->new_close_element( ) ).
i = i + 1.
ENDIF.
ENDDO.
writer->write_node( writer->new_close_element( ) ).
out->next_section( 'Modified JSON'
)->write_xml(
CAST cl_sxml_string_writer( writer )->get_output( )
)->display( ).
DATA(result_json) = CAST cl_sxml_string_writer( writer )->get_output( ).
Resulted JSON will be dumped into result_json variable and you can push it further wherever you want.
Here is resulted JSON, with capitalized passenger values and extended with flight node, which contains flight number and attributed with data and plane type:
<flights>
<object>
<flight FLIGHT_DATE=" 20160610 " PLANE_TYPE=" A380‑800 "> 0002 </flight>
<object name="main">
<array name="PASSENGERS">
<object>
<flight FLIGHT_DATE=" 20160712 " PLANE_TYPE=" A380‑800 "> 0002 </flight>
<str name="NAME">HORST</str>
<str name="TITLE">HERR</str>
<num name="AGE">30</num>
</object>
<object>
<flight FLIGHT_DATE=" 20160813 " PLANE_TYPE=" A380‑800 "> 0002 </flight>
<str name="NAME">JUTTA</str>
<str name="TITLE">FRAU</str>
<num name="AGE">35</num>
</object>
<object>
<flight FLIGHT_DATE=" 20160914 " PLANE_TYPE=" A380‑800 "> 0002 </flight>
<str name="NAME">INGO</str>
<str name="TITLE">HERR</str>
<num name="AGE">31</num>
</object>
</array>
</object>
</object>
</flights>
I have this JSON file in a data lake that looks like this:
{
"id":"398507",
"contenttype":"POST",
"posttype":"post",
"uri":"http://twitter.com/etc",
"title":null,
"profile":{
"#class":"PublisherV2_0",
"name":"Company",
"id":"2163171",
"profileIcon":"https://pbs.twimg.com/image",
"profileLocation":{
"#class":"DocumentLocation",
"locality":"Toronto",
"adminDistrict":"ON",
"countryRegion":"Canada",
"coordinates":{
"latitude":43.7217,
"longitude":-31.432},
"quadKey":"000000000000000"},
"displayName":"Name",
"externalId":"00000000000"},
"source":{
"name":"blogs",
"id":"18",
"param":"Twitter"},
"content":{
"text":"Description of post"},
"language":{
"name":"English",
"code":"en"},
"abstracttext":"More Text and links",
"score":{}
}
}
in order to call the data into my application, I have to turn the JSON into a string using this code:
DECLARE #input string = #"/MSEStream/{*}.json";
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
#allposts =
EXTRACT
jsonString string
FROM #input
USING Extractors.Text(delimiter:'\b', quoting:true);
#extractedrows = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS er FROM #allposts;
#result =
SELECT er["id"] AS postID,
er["contenttype"] AS contentType,
er["posttype"] AS postType,
er["uri"] AS uri,
er["title"] AS Title,
er["acquisitiondate"] AS acquisitionDate,
er["modificationdate"] AS modificationDate,
er["publicationdate"] AS publicationDate,
er["profile"] AS profile
FROM #extractedrows;
OUTPUT #result
TO "/ProcessedQueries/all_posts.csv"
USING Outputters.Csv();
This output the JSON into a .csv file that is readable and when I download the file all data is displayed properly. My problem is when I need to get the data inside profile. Because the JSON is now a string I can't seem to extract any of that data and put it into a variable to use. Is there any way to do this? or do I need to look into other options for reading the data?
You can use JsonTuple on the profile string to further extract the specific properties you want. An example of U-SQL code to process nested Json is provided in this link - https://github.com/Azure/usql/blob/master/Examples/JsonSample/JsonSample/NestedJsonParsing.usql.
You can use JsonTuple on the profile column to further extract specific nodes
E.g. use JsonTuple to get all the child nodes of the profile node and extract specific values like how you did in your code.
#childnodesofprofile =
SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(profile) AS childnodes_map
FROM #result;
#values =
SELECT
childnodes_map["name"] AS name,
childnodes_map["id"] AS id
FROM #result;
Alternatively, if you are interested in specific values, you can also pass paramters to the JsonTuple function to get the specific nodes you want. The code below gets the locality node from the recursively nested nodes (as described by the "$..value" construct.
#locality =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(profile, "$..locality").Values AS locality
FROM #result;
Other supported constructs by JsonTuple
JsonTuple(json, "id", "name") // field names
JsonTuple(json, "$.address.zip") // nested fields
JsonTuple(json, "$..address") // recursive children
JsonTuple(json, "$[?(#.id > 1)].id") // path expression
JsonTuple(json) // all children
Hope this helps.
We are attempting to create a schema to load a massive JSON structure into Hive. We are having a problem, however, in that some fields have leading underscores for names--at the root level, this is fine, but we have not found a way to make this work for nested fields.
Sample JSON:
{
"_id" : "319FFE15FF908EDD86B7FDEADBEEFBD8D7284128841B14AA6A966923C268DF39",
"SomeThing" :
{
"_SomeField" : 22,
"AnotherField" : 2112,
"YetAnotherField": 1
}
. . . etc . . . .
Using a schema as follows:
create table testSample
(
id string,
something struct
<
somefield:int,
anotherfield:bigint,
yetanotherfield:int
>
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties
(
"mapping.id" = "_id",
"mapping.somefield" = "_somefield"
);
This schema builds OK--however, after loading the in above sample, the value of "somefield" (the nested + leading underscore one) is always null (all the other values exist and are correct).
We've been trying a lot of syntax combinations, but to no avail.
Does anyone know the trick to hap a nested field with a leading underscore in its name?
Cheers!
Answering my own question here: there is no trick because you can't.
However, there's an easy work-around: you can tell Hive to treat the names as literals upon creating the schema. If you do this, you will also need to query using the same literal syntax. In the above example, it would look like:
`_something` struct<rest_of_definitions>
without any special serde properties for it.
Then use again in query:
select stuff.`_something` from sometable;
e.g., schema:
create table testSample
(
id string,
something struct
<
`_somefield`:int,
anotherfield:bigint,
yetanotherfield:int
>
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties("mapping.id" = "_id");
for an input JSON like:
{
"_id": "someuid",
"something":
{
"_somefield": 1,
"anotherfield": 2,
"yetanotherfield": 3
}
}
with a query like:
select something.`_somefield`
from testSample
where something.anotherfield = 2;