I'm new to Solr and I'm trying to test its functionalities. I come from RDBMS world and was wondering how Solr would perform with my data.
I created a new core:
$ bin/solr create -c test
and successfully loaded a JSON file using:
$ bin/post -c test file.json
The first record of file.json looks like this:
{"attr":"01234"}
but Solr stores it as:
{"attr":1234}
I began defining a Data Import Handler following this tutorial (Youtube video) in order to correctly store my data, and found that JSON can't be processed by DIH. I'm stuck at the definition of data-config.xml because the tutorial treats XML files using the XPathEntityProcessor but can't find a JSON or even a CSV processor (I can easily retrieve a CSV version of file.json, so loading a CSV or a JSON is the same for me). The official documentation is a bit of a mess and doesn't provide many useful examples. The solely processors that probably treat JSON and CSV documents are LineEntityProcessor and PlainTextEntityProcessor ( Official Documentation).
This other link from the Solr Wiki states:
Goals
...
Make it possible to plugin any kind of datasource (ftp,scp etc) and any other format of user choice (JSON,csv etc)
so I guess it is really possible, but HOW?
I found a similar question posted in 2014 that no one answered here, so was wondering if in 2016, with the newer versions of Solar, there is a well known solution to this problem.
So the question is: how to import JSON and CSV documents using a specific data schema?
UPDATE
Executing http://localhost:8983/solr/test/dihupdate?command=full-import doesn't trigger any error but doesn't load any document. Here are the various xml files located in the core directory:
solrconfig.xml
...
<schemaFactory class="ClassicIndexSchemaFactory" />
...
<requestHandler name="/dihupdate" class="org.apache.solr.handler.dataimport.DataImportHandler" startup="lazy">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
...
schema.xml
...
<field name="id" type="long" indexed="true" stored="true" required="true" multiValued="false" />
<field name="attr1" type="string" indexed="true" stored="true" required="true" multiValued="true" />
<field name="date" type="date" indexed="true" stored="false" multiValued="true" />
<field name="attr2" type="string" indexed="true" stored="true" multiValued="true" />
<field name="attr3" type="string" indexed="true" stored="true" multiValued="true" />
<field name="attr4" type="int" indexed="false" stored="true" multiValued="true" />
<uniqueKey>id</uniqueKey>
...
data-config.xml
<dataConfig>
<dataSource type="FileDataSource" />
<document>
<entity name="f" processor="FileListEntityProcessor"
fileName="test.json"
rootEntity="false"
dataSource="null"
recursive="true"
baseDir="/path/to/data/"/>
</document>
</dataConfig>
In the Solr distribution, there is a films example (in example/films) that shows how to index JSON and takes advantage of both exact field definitions and type auto-detect. The instructions (README.txt) include the results you will see if you forget to do one of the steps as well.
I suggest you experiment with that and then apply that knowledge to your own use case.
Defining the schema is either done in schema.xml in your conf directory - this is the traditional way of setting up the expected format for documents (Defining Fields). If you're using the "Managed Schema" mode which is the current default, you'll have to switch to using the classic schema factory. You can then define the fields in your schema.xml by following the example schema, or any resource available on the web that describes how the schema.xml file is structured (you define a field type and then fields that uses that field type).
The other option is the managed schema - this is the default in the most recent releases, and this schema is manipulated through the API that Solr offers. On startup it reads the initial schema from schema.xml (if present), but after that you'll have to modify it through the API or the Admin interface. This API is described (with examples) at the Schema API page in the Solr guide.
Using a StrField (which ìs what the field type string uses) to store 012345 would result in Solr storing just the literal value, 012345, without converting it to an integer. That's probably a good place to start.
Related
I am on a red hat system and I have multiple XML files generated from various SOAP requests that are in a format that is not compatible with MySQL's LoadXML function. I need to load the data into MySQL tables. One table will be setup for each type of XML file, depending on the data received via the Soap XML API.
Sample format of one of the files is as this, but each file will have a different number of columns and different column names. I am trying to find a way to convert them to a compatible format in the most generic way possible since I will have to create any customized solution for each API request/response.
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<dbd:DataRetrievalRequestResponse xmlns:dbd="dbd.v1">
<DataObjects>
<ObjectSelect>
<mdNm>controller-ac</mdNm>
<meNm>WALL-EQPT-A</meNm>
</ObjectSelect>
<DataInstances>
<DataInstance>
<instanceId>DSS1</instanceId>
<Attribute>
<name>Name</name>
<value>DSS1</value>
</Attribute>
<Attribute>
<name>Operational Mode</name>
<value>mode-fast</value>
</Attribute>
<Attribute>
<name>Rate - Down</name>
<value>1099289</value>
</Attribute>
<Attribute>
<name>Rate - Up</name>
<value>1479899</value>
</Attribute>
</DataInstance>
<DataInstance>
<instanceId>DSS2</instanceId>
<Attribute>
<name>Name</name>
<value>DSS2</value>
</Attribute>
<Attribute>
<name>Operational Mode</name>
<value>mode-fast</value>
</Attribute>
<Attribute>
<name>Rate - Down</name>
<value>1299433</value>
</Attribute>
<Attribute>
<name>Rate - Up</name>
<value>1379823</value>
</Attribute>
</DataInstance>
</DataInstances>
</DataObjects>
</dbd:DataRetrievalRequestResponse>
</soap:Body>
</soap:Envelope>
Of course I want the data to be entered into a mysql table with column names 'id, Name, Group' rows for each unique instance
Name
Operational Mode
Rate - Down
Rate - Up
DSS1
mode-fast
1099289
1479899
DSS2
mode-fast
1299433
1379823
Do I need to create an XSLT and preprocess this XML data from command line prior to running it to LoadXML to get it into a format that MySQL LoadXML function will accept? This would not be a problem, but I am not familiar with XSLT transformations.
Is there a way to reformat the above XML to straight CSV (preferred), or to another XML format that is compatible, such as the examples given in mysql documentation for loadxml?
<row>
<field name='column1'>value1</field>
<field name='column2'>value2</field>
</row>
I tried doing LOAD DATA INFILE and using ExtractValue function, but some of the values have spaces in them, and the delimiter for ExtractValue is hard coded to single-space. This makes it unusable as a workaround.
Your question is very general (which is fine!) so my answer is also quite general.
Firstly, it's certainly true that XSLT is an ideal generic tool for problems of this sort. I have absolutely no doubt that every one of your SOAP messages could be coerced into a suitable form, using an XSLT that's customised for each type of message, while still remaining structurally very similar, which is what you'd want if you're new to XSLT.
I'm not sure how familiar you are with XPath, XML, XML namespaces, etc, but I think the task here is simple enough to tackle, and if you do have any tricky XPath expressions to write you can always come back to StackOverflow and ask for help.
From what you've said it sounds like you're confident that each SOAP message can be mapped to a single table. I'm going to suggest an XSLT pattern that would be customisable for each type of SOAP message, where you have an xsl:for-each statement that iterates over each row, and within that you create a row element and populate it with fields.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- indent the output, for ease of reading -->
<xsl:output indent="yes"/>
<!-- process the document -->
<xsl:template match="/">
<!-- create the root element of the output -->
<resultset>
<!-- create each row of the output, by iterating over the
repeating elements in the SOAP message -->
<xsl:for-each
select="//DataInstance">
<row>
<!-- create each field -->
<!-- This field is defined individually, and the value
is produced by evaluating the 'instanceId' xpath
relative to the current DataInstance -->
<field name="id"><xsl:value-of select="instanceId"/></field>
<!-- these field can be generated with a loop -->
<xsl:for-each select="Attribute">
<field name="{name}"><xsl:value-of select="value"/></field>
</xsl:for-each>
</row>
</xsl:for-each>
</resultset>
</xsl:template>
</xsl:stylesheet>
Result of this, run over your sample SOAP message:
<resultset>
<row>
<field name="id">DSS1</field>
<field name="Name">DSS1</field>
<field name="Operational Mode">mode-fast</field>
<field name="Rate - Down">1099289</field>
<field name="Rate - Up">1479899</field>
</row>
<row>
<field name="id">DSS2</field>
<field name="Name">DSS2</field>
<field name="Operational Mode">mode-fast</field>
<field name="Rate - Down">1299433</field>
<field name="Rate - Up">1379823</field>
</row>
</resultset>
If you can follow this general pattern, you should be able to write a custom XSLT for every kind of SOAP message in your collection. You will just need to modify the various XPath expressions in the stylesheet:
//DataInstance means "every DataInstance"
instanceId means "the instanceId that's a child of the current ("context") element.
name means "the name element that's a child of the current element.
value means "the value element that's a child of the current element.
In the example SOAP message you gave, the Attribute element maps to a field, so all those elements could be copied generically, with another xsl:for-each, but for your other documents you may have to just define each field element individually, as I did for the id element in my answer.
I am working with MuleSoft Anypoint Studio and I need to convert JSON payload to in the end XML. During this conversion every field that is NULL need to be excluded. Some values are not sent via POST request and I am expecting to not see them in the end result - XML file but that is not the case as they are there. For example in the JSON POST request Value field is not sent, which becomes null in Mule so it should not appear in the XML file but it's still written in it as <Value/>. I am mainly having problems with Object to JSON transformer.
I have tried configuring a custom mapper
<spring:beans>
<spring:bean id="Bean" name="NonNullMapper" class="org.codehaus.jackson.map.ObjectMapper">
<spring:property name="SerializationInclusion">
<spring:value type="org.codehaus.jackson.map.annotate.JsonSerialize.Inclusion">NON_NULL</spring:value>
</spring:property>
</spring:bean>
But that didn't really work. I also tried
<spring:beans>
<spring:bean id="jacksonObjectMapper" class="org.codehaus.jackson.map.ObjectMapper" />
<spring:bean
class="org.springframework.beans.factory.config.MethodInvokingFactoryBean">
<spring:property name="targetObject" ref="jacksonObjectMapper" />
<spring:property name="targetMethod" value="configure" />
<spring:property name="arguments">
<spring:list>
<spring:value>WRITE_NULL_MAP_VALUES</spring:value>
<spring:value>false</spring:value>
</spring:list>
</spring:property>
</spring:bean>
</spring:beans>
That didn't work too as I get an error which I couldn't manage to fix
More than one object of type class org.codehaus.jackson.map.ObjectMapper registered but only one expected
I am working with
Mule 3.9.0
Anypoint Studio 6.4
com.fasterxml.jackson and in some places org.codehaus.jackson
I would really appreciate any help or some hint.
Given that this in Mule, you can use DataWeave instead to transform the payload. Setting the XML writer property skipNullOn could give the desired result. https://docs.mulesoft.com/mule-user-guide/v/3.9/dataweave-formats#skip-null-on
Example
%output application/xml skipNullOn="payload"
---
payload
I'm trying to fill the autocomplete field in Orbeon (version 2016.1) with suggestions which I receive as a JSON.
The JSON I get looks like:
{"status":"success","code":200,"data":{"streets":[{"name":"Street One","id":"1"},{"name":"Street Two","id":"2"},{"name":"Street Three","id":"3"}]}}
I know that the Resource URI should point to my web service (could that URI, or the arguments I need to send, be encoded?), but I don't know how the Items, Label and Value fields should be configured in this case (the label would be name from the json and value should point to the code from the json, of course).
I referred to https://doc.orbeon.com/xforms/submission-json.html but haven't exactly managed to get what I'm trying to.
Can someone help?
Thanks in advance.
Masa
In particular, with your specific JSON, the corresponding XML will look as follows. In general, see the section Seeing the converted XML for how you can create a form in Form Builder that allows you to see what the converted XML is for any JSON.
<json type="object">
<status>success</status>
<code type="number">200</code>
<data type="object">
<streets type="array">
<_ type="object">
<name>Street One</name>
<id>1</id>
</_>
<_ type="object">
<name>Street Two</name>
<id>2</id>
</_>
<_ type="object">
<name>Street Three</name>
<id>3</id>
</_>
</streets>
</data>
</json>
I've using Spring/AngularJS and to prevent JSON vulnerability, I'm trying to prefix all JSON array responses with ")]}',\n" - see reference.
I was able to prefix by
<mvc:annotation-driven>
<mvc:message-converters>
<bean id="mappingJackson2HttpMessageConverter" class="org.springframework.http.converter.json.MappingJackson2HttpMessageConverter" >
<property name="jsonPrefix" value=")]}',\n" />
</bean>
</mvc:message-converters>
</mvc:annotation-driven>
But the problem is it's prefixing all JSON responses with ")]}',\n" and I only need to prefix the JSON arrays. Is there a way I could only set the prefix for JSON array responses? Thanks.
Instead of having a prefix which basically makes your response invalid JSON consider returning a object instead of an array. This will mitigate the attack vector as well.
{d: [1,2,3,4]}
what is today the easiest, most automated way to import complex XML (external from an API including an .xsd scheme) into a relational or any database? - I understand there should be a (semi)automatic way to import this for every database, i just yet didnt find it? *
This also comes from the question why use XML complification for relational data? Why isnt API data that comes from a relational database and shall end up in one at most users of the API usually also transfered in rows? Table VS xml / json / yaml - table requires less storage if data is any related? more efficient than compression
:)
You're looking for a universal tool. However the concept of XML is fundamentally incompatible with the concept of a (relational) database.
The universal tool you're looking for should at least cover three fundamtal type of operations:
a) The XML is defined as a projection of a table/row/field concept
<xml>
<table name='myTable'>
<row id='1'>
<field name='myField1' type='string'>myValue1</field>
<field name='myField2' type='date'>01-01-1901</field>
<field name='myField3' type='number'>123456</field>
</row>
</table>
</xml>
b) The XML is to be stored in one XML field in one row of a table
Id Name Date XML
-- ---- ---------- -------------------------
1 MyEx 01-01-2001 <myObject>
<myAttribute name='class'>example</myAttribute>
</myObject>
c) The XML is a projection of a parent/child relationship in the database
<xml>
<order number='123'>
<customer id='1001'>myCustomer</customer>
<orderDate>01-01-2001</orderDate>
<address>wherever to go</address>
<orderDetails>
<orderProduct code='P01'>
<name>myProduct</name>
<amount>15</amount>
<listPrice>$14.00</listPrice>
</orderProduct>
</orderDetails>
</order>
</xml>
In each and every case the tool must specify if you are allowed to import one of such objects or more, and the tool must be able to transform the presenation of values to acceptable storage formats.
All of this is not impossible, but important to check which of these functionalities your selected tool will support.