Is it possible to define optional fields in a Smooks CSV reader - csv

I want to read CSV entries with a Smooks CSV reader.
But my problem is how to define a field as optional.
<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.2.xsd">
<csv:reader fields="field1,field2,OPTONAL_FIELD3,$ignore$+" />
</smooks-resource-list>
Is there any way to define such a configuration?
The data I want to read looks like this:
123,4,opt1
456,7
If field 3 is declared as normal in the configuration, the line must always ends with a ',' like this:
456,7,

Related

Skip faulty lines when using Solr's csv handler

I want to parse a csv file using the solr handler. The problem is that my file might contain problematic lines (those lines can contain unescaped encaptulators). When Solr finds one such line, fails with the following message and stops
<str name="msg">CSVLoader: input=null, line=1941,can't read line: 1941
values={NO LINES AVAILABLE}</str><int name="code">400</int>
I understand that in that case the parser cannot fix the problematic line and this ok for me.I just want to skip the faulty line and continue with the rest of the file.
I tried using the TolerantUpdateProcessorFactory in my processor chain but the result was the same.
I use solr 6.5.1 and the curl command that I try is something like that
curl '<path>/update?update.chain=tolerant&maxErrors=10&commit=true&fieldnames=<my fields are provided>,&skipLines=1' --data-binary #my_file.csv -H 'Content-type:application/csv'
Finally this is what I put in my solrconfig.xml
<updateRequestProcessorChain name="tolerant">
<processor class="solr.TolerantUpdateProcessorFactory">
<int name="maxErrors">10</int>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
I would suggest that you pre-process and clean the data using the using the UpdateRequestProcessors.
This is a mechanism to transform the documents that is submitted to Solr for indexing.
Read more about UpdateRequestProocessors

LOAD XML LOCAL INFILE ampersand issue

I want to import XML data which contains ampersands into MySQL.
The import fails to run after a row has been encountered with a raw ampersand (&). Admittedly this is not correct XML but that is what I am working with.
I have tried replacing the raw ampersands with & - this appears in the database as the raw text (not the equivalent ASCII ampersand).
I have tried replacing the raw ampersands with \& - this stops the import routine from running further.
Can you suggest how I can get the raw ampersand into the database using LOAD XML LOCAL INFILE?
Sample raw XML follows:
<?xml version="1.0" ?>
<REPORT>
<CLA>
<PLOT>R&S</PLOT>
<VAL>100.10</VAL>
</CLA>
<CLA>
<PLOT>G&N</PLOT>
<VAL>200.20</VAL>
</CLA>
</REPORT>
Admittedly this is not correct xml but that is what I am working
with.
No, it's not that it's incorrect XML. It is not XML at all because it is not well-formed.
You have two ways forward:
Fix the data by treating it as text to turn it into XML. (Replace
the & with &.)
Load the data into the database using a non-XML data type.

CSV rendering of page doesn't handle newline character within same field

Moqui supports a superb feature to render any page in different formats. If I render the following code in CSV format and suppose the description field contains newline(i.e. enter key) character, then it shows it correctly in form-list but in CSV it changes the current row. I think it should not behave like this.
<form-list name="communicationDetail" list="communicationDetailList">
<field name="communicationEventId"><default-field><display/></default-field></field>
<field name="description"><default-field><display/></default-field></field>
</form-list>
Please help me how that newline character can be ignored while rendering data in CSV. I think it should
Thanks in advance :-)
CSV output for XML Screen is done with the DefaultScreenMacros.csv.ftl file. There is a macro at the top of the file called csvValue that does the minimal encoding. More could certainly be done there, to match whatever the parser you are using will work with.

Solr 4.7.1 uploading CSV Document is missing mandatory uniqueKey field id

I'm new to solr (4.7.1). I unzip the solr code and copied the schemaless example to its own directory. I then used start.jar passing it -Dsolr.solr.home=" to the new location. Jetty came up and everything appears to be working on that front.
Now I wanted to upload/update a csv file to it. Here's what I used:
curl http://localhost:8983/solr/update/csv --data-binary #c:\solrschemaless\data.csv -H "Content-type:text/csv; charset=utf-8"
I received the following:
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">0</int>
</lst>
<lst name="error">
<str name="msg">Document is missing mandatory uniqueKey field: id</str>
<int name="code">400</int>
</lst>
</response>
The csv file has a column named XXXXID. I changed to "id", "id_s", "id_i" but still getting the same error. There are a lot of post on SO and elsewhere but thus far I didn't see one for the schemaless model.
EDIT: I reduced down my csv file to this:
id,Contact,Address,Focus,Type
2,97087,1170,NULL,1
and I'm still getting the same error message of missing mandatory uniqueKey.
I'm on Windows 8.
Any ideas?
Figured it out. For some reason the column id cannot be the first column on the list. When I change
id,Contact,Address,Focus,Type
2,97087,1170,NULL,1
to this
foo,id,Contact,Address,Focus,Type
bar,2,97087,1170,NULL,1
it works.

XML output from MySQL query

is there any chance of getting the output from a MySQL query directly to XML?
Im referring to something like MSSQL has with SQL-XML plugin, for example:
SELECT * FROM table WHERE 1 FOR XML AUTO
returns text (or xml data type in MSSQL to be precise) which contains an XML markup structure generated
according to the columns in the table.
With SQL-XML there is also an option of explicitly defining the output XML structure like this:
SELECT
1 AS tag,
NULL AS parent,
emp_id AS [employee!1!emp_id],
cust_id AS [customer!2!cust_id],
region AS [customer!2!region]
FROM table
FOR XML EXPLICIT
which generates an XML code as follows:
<employee emp_id='129'>
<customer cust_id='107' region='Eastern'/>
</employee>
Do you have any clues how to achieve this in MySQL?
Thanks in advance for your answers.
The mysql command can output XML directly, using the --xml option, which is available at least as far back as MySql 4.1.
However, this doesn't allow you to customize the structure of the XML output. It will output something like this:
<?xml version="1.0"?>
<resultset statement="SELECT * FROM orders" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<field name="emp_id">129</field>
<field name="cust_id">107</field>
<field name="region">Eastern</field>
</row>
</resultset>
And you want:
<?xml version="1.0"?>
<orders>
<employee emp_id="129">
<customer cust_id="107" region="Eastern"/>
</employee>
</orders>
The transformation can be done with XSLT using a script like this:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="resultset">
<orders>
<xsl:apply-templates/>
</orders>
</xsl:template>
<xsl:template match="row">
<employee emp_id="{field[#name='emp_id']}">
<customer
cust_id="{field[#name='cust_id']}"
region="{field[#name='region']}"/>
</employee>
</xsl:template>
</xsl:stylesheet>
This is obviously way more verbose than the concise MSSQL syntax, but on the other hand it is a lot more powerful and can do all sorts of things that wouldn't be possible in MSSQL.
If you use a command-line XSLT processor such as xsltproc or saxon, you can pipe the output of mysql directly into the XSLT program. For example:
mysql -e 'select * from table' -X database | xsltproc script.xsl -
Using XML with MySQL seems to be a good place to start with various different ways to get from MySQL query to XML.
From the article:
use strict;
use DBI;
use XML::Generator::DBI;
use XML::Handler::YAWriter;
my $dbh = DBI->connect ("DBI:mysql:test",
"testuser", "testpass",
{ RaiseError => 1, PrintError => 0});
my $out = XML::Handler::YAWriter->new (AsFile => "-");
my $gen = XML::Generator::DBI->new (
Handler => $out,
dbh => $dbh
);
$gen->execute ("SELECT name, category FROM animal");
$dbh->disconnect ();
Do you have any clue how to achieve this in MySQL?
Yes, go by foot and make the xml yourself with CONCAT strings. Try
SELECT concat('<orders><employee emp_id="', emp_id, '"><customer cust_id="', cust_id, '" region="', region, '"/></employee></orders>') FROM table
I took this from a 2009 answer How to convert a MySQL DB to XML? and it still seems to work. Not very handy, and if you have large trees per item, they will all be in one concatenated value of the root item, but it works, see this test with dummies:
SELECT concat('<orders><employee emp_id="', 1, '"><customer cust_id="', 2, '" region="', 3, '"/></employee></orders>') FROM DUAL
gives
<orders><employee emp_id="1"><customer cust_id="2" region="3"/></employee></orders>
With "manual coding" you can get to this structure.
<?xml version="1.0"?>
<orders>
<employee emp_id="1">
<customer cust_id="2" region="3" />
</employee>
</orders>
I checked this with a larger tree per root item and it worked, but I had to run an additional Python code on it to get rid of the too many openings and closings generated when you have medium level nodes in an xml path. It is possible using backward-looking lists together with entries in a temporary set, and I got it done, but an object oriented way would be more professional. I just coded to drop the last x items from the list as soon as a new head item was found, and some other tricks for nested branches. Worked.
I puzzled out a Regex that found each text between tags:
string = " <some tag><another tag>test string<another tag></some tag>"
pattern = r'(?:^\s*)?(?:(?:<[^\/]*?)>)?(.*?)?(?:(?:<\/[^>]*)>)?'
p = re.compile(pattern)
val = r''.join(p.findall(string))
val_escaped = escape(val)
if val_escaped != val:
string.replace(val, val_escaped)
This Regex helps you to access the text between the tags. If you are allowed to use CDATA, it is easiest to use that everywhere. Just make the content "CDATA" (character data) already in MySQL:
<Title><![CDATA[', t.title, ']]></Title>
And you will not have any issues anymore except for very strange characters like (U+001A) which you should replace already in MySQL. You then do not need to care for escaping and replacing the rest of the special characters at all. Worked for me on a 1 Mio. lines xml file with heavy use of special characters.
Yet: you should validate the file against the needed xml schema file using Python's module xmlschema. It will alert you when you are not allowed to use that CDATA trick.
If you need a fully UTF-8 formatted content without CDATA, which might often be the task, you can reach that even in a 1 Mio lines file by validating the code output (= xml output) step by step against the xml schema file (xsd that is the aim). It is a bit fiddly work, but it can be done with some patience.
Replacements are possible with:
MySQL using replace()
Python using string.replace()
Python using Regex replace (though I did not need it in the end, it would look like: re.sub(re.escape(val), 'xyz', i))
string.encode(encoding = 'UTF-8', errors = 'strict')
Mind that encoding as utf-8 is the most powerful step, it could even put aside all three other replacement ways above. Mind also: It makes the text binary, you then need to treat it as binary b'...' and you can thus write it to a file only in binary mode using wb.
As the end of it all, you may open the XML output in a normal browser like Firefox for a final check and watch the XML at work. Or check it in vscode/codium with an xml Extension. But these checks are not needed, in my case the xmlschema module has shown everything very well. Mind also that vscode/codium can can handle xml problems quite easily and still show a tree when Firefox cannot, therefore, you will need a validator or a browser to see all xml errors.
Quite a huge project could be done using this xml-building-with-mysql, at the end there was a triple nested xml tree with many repeating tags inside parent nodes, all made from a two-dimensional MySQL output.