Read CSV file escaping more than one character in Mulesoft - csv

I have a CSV file with a header and these values:
"20000160";"20000160";"177204930";"Zusammendruck ""Blumen"" nk 01.03.07";"2021";"01";"EUR";"599.000";"599,000";"599.00";"599,00 EUR";"EUR";"0.00";"0,00 EUR";"0.00";"0,00 EUR";"EUR"
"20000000";"20000000";"1013";"Einschreiben";"2021";"01";"EUR";"0.000";"0,000";"22.80";"22,80 EUR";"EUR";"0.00";"0,00 EUR";"0.00";"0,00 EUR";"EUR"
"20000000";"20000000";"1018";"Rückschein";"2021";"01";"EUR";"0.000";"0,000";"6.60";"6,60 EUR";"EUR";"0.00";"0,00 EUR";"0.00";"0,00 EUR";"EUR"
"8003325905";"8003325905";"233800118";"Prof.Services: Datenmanagement;Pauschale";"2021";"01";"EUR";"0.000";"0,000";"600.00";"600,00 EUR";"EUR";"0.00";"0,00 EUR";"108.00";"108,00 EUR";"EUR"
I configured File Read connector to escape "Zusammendruck ""Blumen"" nk 01.03.07", and it is working:
<file:read doc:name="Read CSV, Set MIME Type" doc:id="bb378f83-d0ea-4951-8253-8253953ed9e7" path="${outputCSV}" outputMimeType='application/csv; streaming=true; quote="\""; separator=";"; escape="\""' outputEncoding="UTF-8" />
But I also have to escape ; to correctly parse "Prof.Services: Datenmanagement;Pauschale". I tried to configure the pattern as escape="\"|;" but I got an warn:
WARN 2021-12-20 16:59:34,604 [[MuleRuntime].uber.26:
[test].upload.BLOCKING #27454a45] [processor: ; event:
eb625e31-61b5-11ec-a30d-00090ffe0001]
org.mule.weave.v2.model.service.DefaultLoggingService$: Option
escape="|; expects a value of length 1 but got "|;. Only the
first character is going to be used and the rest is going to be
ignored.
How can I read and parse data correctly, considering the example data?

The default CSV parser in DataWeave has some limitations. For this reason I have developed a Mule module based on Apache Commons CSV. You can find it on GitHub: https://github.com/rbutenuth/csv-module The dependency is available on Maven Central, so you don't need to compile it yourself.
It can parse your CSV with the following settings:
<csv:config name="Csv_Config" doc:name="Csv Config" recordSeparator="\n" withHeaderLine="false" escape="">
<csv:columns >
<csv:column columnName="column_01" type="TEXT" />
<csv:column columnName="column_02" type="TEXT" />
<!-- columnss 3 to 16 ommitted -->
<csv:column columnName="column_17" type="TEXT" />
</csv:columns>
</csv:config>
I have used column_01 to column_17 as column names, as I could not guess meaningfull names from the content.
You can achieve it in your case with DataWeave with the following settings:
application/csv separator=";",quoteValues=true,quote="\"",escape="\""

Related

How to pass Quarkus List configuration via environment variable

Is there a convenient way to pass lists as environment variables for Quarkus Configuration except for Comma separation?
MY_FOO=val1,val2,val3
Comma separation works fine (even if it does not look so nice for long lists). But if you have to pass a list of entries where each entry has commas in it, it won't work.
I'm thinking about something similar to Spring Configuration where you can pass list entries with an index as postfix
MY_FOO_0_ = val1
MY_FOO_1_ = val2
MY_FOO_2_ = val3
Quarkus uses MicroProfile and SmallRye for this and you can achieve the desired result using indexed properties:
# MicroProfile Config - Collection Values
my.collection=dog,cat,turtle
# SmallRye Config - Indexed Property
my.indexed.collection[0]=dog
my.indexed.collection[1]=cat
my.indexed.collection[2]=turtle
From the documentation:
A call to Config#getValues("my.collection", String.class), will automatically create and convert a List that contains the values dog, cat and turtle. A call to Config#getValues("my.indexed.collection", String.class) returns the exact same result.
Following the rules for conversion in environment variables, you would then pass the environment variables as
MY_INDEXED_COLLECTION_0_=dog
MY_INDEXED_COLLECTION_1_=cat
MY_INDEXED_COLLECTION_2_=turtle
and access these by
ConfigProvider.getConfig().getValue("my.indexed.collection", String.class);
Documentation on indexed properties: https://smallrye.io/smallrye-config/2.11.1/config/indexed-properties/
Documentation on environment variables: https://smallrye.io/smallrye-config/2.11.1/config/environment-variables
If your problem is to pass a comma (,) inside one item of your array, I believe this will help you.
Below I show one example where I pass the items of an array separated by comma (,) and one specific item ( AllowedRemoteAddresses ) of the array that receives commas.
Config variable inside your application
#ConfigProperty(name = "quickfix")
List<String> quickfixSessionSettings;
application.properties
In the Eclipse Microprofile Config properties file, I just have to put double backslashes before the comma:
quickfix=[default],\
# Sessions,\
[session],\
BeginString=FIX.4.4,\
SenderCompID=EXEC,\
TargetCompID=BANZAI,\
ConnectionType=acceptor,\
StartTime=00:00:00,\
EndTime=00:00:00,\
# Aceptor,\
SocketAcceptPort=9880,\
# Logging,\
ScreenLogShowHeartBeats=Y,\
# Store,\
# FileStorePath=target/data/store,\
JdbcStoreMessagesTableName=messages,\
JdbcStoreSessionsTableName=sessions,\
JdbcLogHeartBeats=Y,\
JdbcLogIncomingTable=messages_log_incoming,\
JdbcLogOutgoingTable=messages_log_outgoing,\
JdbcLogEventTable=event_log,\
JdbcSessionIdDefaultPropertyValue=not_null,\
AllowedRemoteAddresses=localhost\\,127.0.0.1\\,172.0.0.2
List<String> result inside application:
[default]
# Sessions
[session]
BeginString=FIX.4.4
SenderCompID=EXEC
TargetCompID=BANZAI
ConnectionType=acceptor
StartTime=00:00:00
EndTime=00:00:00
# Aceptor
SocketAcceptPort=9880
# Logging
ScreenLogShowHeartBeats=Y
# Store
# FileStorePath=target/data/store
JdbcStoreMessagesTableName=messages
JdbcStoreSessionsTableName=sessions
JdbcLogHeartBeats=Y
JdbcLogIncomingTable=messages_log_incoming
JdbcLogOutgoingTable=messages_log_outgoing
JdbcLogEventTable=event_log
JdbcSessionIdDefaultPropertyValue=not_null
AllowedRemoteAddresses=localhost,127.0.0.1,172.0.0.2
Yaml deployment file
In the Yaml deployment file, I just have to put one backslash before the comma:
environment:
- QUARKUS_DATASOURCE_JDBC_URL=jdbc:postgresql://postgresql-qfj:5432/postgres?currentSchema=exchange
- QUARKUS_DATASOURCE_USERNAME=postgres
- QUARKUS_DATASOURCE_PASSWORD=postgres
- QUICKFIX=[default],
[session],
BeginString=FIX.4.4,
SenderCompID=EXEC,
TargetCompID=BANZAI,
ConnectionType=acceptor,
StartTime=00:00:00,
EndTime=00:00:00,
SocketAcceptPort=9880,
ScreenLogShowHeartBeats=Y,
JdbcStoreMessagesTableName=messages,
JdbcStoreSessionsTableName=sessions,
JdbcLogHeartBeats=Y,
JdbcLogIncomingTable=messages_log_incoming,
JdbcLogOutgoingTable=messages_log_outgoing,
JdbcLogEventTable=event_log,
JdbcSessionIdDefaultPropertyValue=not_null,
AllowedRemoteAddresses=127.0.0.1\,172.0.0.2\,172.0.0.3\,broker-back-end
List<String> result inside application:
[default]
[session]
BeginString=FIX.4.4
SenderCompID=EXEC
TargetCompID=BANZAI
ConnectionType=acceptor
StartTime=00:00:00
EndTime=00:00:00
SocketAcceptPort=9880
ScreenLogShowHeartBeats=Y
JdbcStoreMessagesTableName=messages
JdbcStoreSessionsTableName=sessions
JdbcLogHeartBeats=Y
JdbcLogIncomingTable=messages_log_incoming
JdbcLogOutgoingTable=messages_log_outgoing
JdbcLogEventTable=event_log
JdbcSessionIdDefaultPropertyValue=not_null
AllowedRemoteAddresses=127.0.0.1,172.0.0.2,172.0.0.3,broker-back-end

Is there a way to programmatically set a dataset's schema from a .csv

As an example, I have a .csv which uses the Excel dialect which uses something like Python's csv module doubleQuote to escape quotes.
For example, consider the row below:
"XX ""YYYYYYYY"", ZZZZZZ ""QQQQQQ""","JJJJ ""MMMM"", RRRR ""TTTT""",1234,RRRR,60,50
I would want the schema to then become:
[
'XX "YYYYYYYY", ZZZZZZ "QQQQQQ"',
'JJJJ "MMMM", RRRR "TTTT"',
1234,
'RRRR',
60,
50
]
Is there a way to set the schema of a dataset in a programmatic/automated fashion?
While you can do this in code, foundrys dataset-app can also do this natively. This means you can skip writing the code (which is nice) but also means you can potentially save a step in your pipeline (which might save you on runtime.)
After uploading the files to a dataset, press "edit schema" on the dataset:
Then apply settings like the following, which would result in the desired outcome in your case:
Then press "save and validate" and the dataset should end up with the correct schema:
Starting with this example:
Dataset<Row> dataset = files
.sparkSession()
.read()
.option("inferSchema", "true")
.csv(csvDataset);
output.getDataFrameWriter(dataset).write();
Add the header, quote, and escape options, like so:
Dataset<Row> dataset = files
.sparkSession()
.read()
.option("inferSchema", "true")
.option("header", "true")
.option("quote", "\"")
.option("escape", "\"")
.csv(csvDataset);
output.getDataFrameWriter(dataset).write();

Mule 4 Pipe Delimited text file to JSON

I need to convert Pipe Delimited text file to JSON in Mule 4. I tried with below code.
`<file:listener doc:name="Fetch Input from Geometry SFTP" doc:id="1151a602-6748-43b7-b491-2caedf6f7010" directory="C:\Vijay\My Projects\AM-Mexico\US\AIM-65\Input" autoDelete="true" outputMimeType="text/csv; separator=|" recursive="false">
<non-repeatable-stream />
<scheduling-strategy >
<fixed-frequency frequency="10" timeUnit="SECONDS"/>
</scheduling-strategy>
</file:listener>
<ee:transform doc:name="Transform Message" doc:id="79dcddf9-34a0-4005-88b4-3a395544be8c" >
<ee:message >
<ee:set-payload ><![CDATA[%dw 2.0
output application/json
input payload text/csv
---
payload]]></ee:set-payload>
</ee:message>
</ee:transform>`
If I try to execute the code i am getting exception as below.
Message : "Internal execution exception while executing the script this is most probably a bug.
Caused by:
java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:147)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at $java.io.InputStream$$FastClassByCGLIB$$31b19c4e.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.mule.extension.file.common.api.stream.AbstractFileInputStream.lambda$createLazyStream$0(AbstractFileInputStream.java:41)
at $java.io.InputStream$$EnhancerByCGLIB$$55e4687e.read(<generated>)
at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:98)
at org.mule.weave.v2.io.DefaultSeekableStream.readUntil(SeekableStream.scala:193)
at org.mule.weave.v2.io.DefaultSeekableStream.delegate$lzycompute(SeekableStream.scala:202)
at org.mule.weave.v2.io.DefaultSeekableStream.delegate(SeekableStream.scala:201)
at org.mule.weave.v2.io.DefaultSeekableStream.seek(SeekableStream.scala:234)
at org.mule.weave.v2.io.SeekableStream.resetStream(SeekableStream.scala:17)
at org.mule.weave.v2.io.SeekableStream.resetStream$(SeekableStream.scala:16)
at org.mule.weave.v2.io.DefaultSeekableStream.resetStream(SeekableStream.scala:186)
at org.mule.weave.v2.model.values.BinaryValue$.getBytesFromSeekableStream(BinaryValue.scala:84)
at org.mule.weave.v2.model.values.BinaryValue$.getBytes(BinaryValue.scala:68)
at org.mule.weave.v2.model.values.BinaryValue.equals(BinaryValue.scala:26)
at org.mule.weave.v2.model.values.BinaryValue.equals$(BinaryValue.scala:25)
at org.mule.weave.v2.module.pojo.reader.JavaBinaryValue.equals(JavaBinaryValue.scala:11)
at org.mule.weave.v2.model.values.wrappers.DelegateValue.equals(DelegateValue.scala:38)
at org.mule.weave.v2.model.values.wrappers.DelegateValue.equals$(DelegateValue.scala:37)
at org.mule.weave.v2.model.values.wrappers.LazyValue.equals(DelegateValue.scala:65)
at org.mule.weave.v2.model.types.Type.$anonfun$acceptsSchema$2(Type.scala:203)
at org.mule.weave.v2.model.types.Type.$anonfun$acceptsSchema$2$adapted(Type.scala:198)
any help on this would be higly appreciated.
You need to set the reader properties for separator:
%dw 2.0
input payload application/csv separator='|'
output application/json
---
payload
Relevant documentation here.

LuaLaTex using fontspec package and luacode reading JSON file

I'm using Latex since years but I'm new to embedded luacode (with Lualatex). Below you can see a simplified example:
\begin{filecontents*}{data.json}
[
{"firstName":"Max", "lastName":"Möller"},
{"firstName":"Anna", "lastName":"Smith"}
];
\end{filecontents*}
\documentclass[11pt]{article}
\usepackage{fontspec}
%\setmainfont{Carlito}
\usepackage{tikz}
\usepackage{luacode}
\begin{document}
\begin{luacode}
require("lualibs.lua")
local file = io.open('data.json','rb')
local jsonstring = file:read('*a')
file.close()
local jsondata = utilities.json.tolua(jsonstring)
tex.print('\\begin{tabular}{cc}')
for key, value in pairs(jsondata) do
tex.print(value["firstName"] .. ' & ' .. value["lastName"] .. '\\\\')
end
tex.print('\\hline\\end{tabular}')
\end{luacode}
\end{document}
When executing Lualatex following error occurs:
LuaTeX error [\directlua]:6: attempt to index field 'json' (a nil value) [\directlua]:6: in main chunk. \end{luacode}
When commenting the line \usepackage{fontspec} the output will be produced. Alternatively, the error can be avoided by commenting utilities.json.tolua(jsonstring) and all following lua-code lines.
So the question is: How can I use both "fontspec" package and json-data without generating an error message? Apart from this I have another question: How to enable german umlauts in output of luacode (see first "lastName" in example: Möller)?
Ah, I'm using TeX Live 2015/Debian on Ubuntu 16.04.
Thank you,
Jerome

R JSON UTF-8 parsing

I have an issue when trying to parse a JSON file in russian alphabet in R. The file looks like this:
[{"text": "Валера!", "type": "status"}, {"text": "когда выйдет", "type": "status"}, {"text": "КАК ДЕЛА?!)", "type": "status"}]
and it is saved in UTF-8 encoding. I tried libraries rjson, RJSONIO and jsonlite to parse it, but it doesn't work:
library(jsonlite)
allFiles <- fromJSON(txt="ru_json_example_short.txt")
gives me error
Error in feed_push_parser(buf) :
lexical error: invalid char in json text.
[{"text": "Валера!", "
(right here) ------^
When I save the file in ANSI encodieng, it works OK, but then, the Russian alphabet transforms into question marks, so the output is unusable.
Does anyone know how to parse such JSON file in R, please?
Edit: Above mentioned applies for UTF-8 file saved in Windows Notepad. When I save it in PSPad and the parse it, the result looks like this:
text type
1 <U+0412><U+0430><U+043B><U+0435><U+0440><U+0430>! status
2 <U+043A><U+043E><U+0433><U+0434><U+0430> <U+0432><U+044B><U+0439><U+0434><U+0435><U+0442> status
3 <U+041A><U+0410><U+041A> <U+0414><U+0415><U+041B><U+0410>?!) status
Try the following:
dat <- fromJSON(sprintf("[%s]",
paste(readLines("./ru_json_example_short.txt"),
collapse=",")))
dat
[[1]]
text type
1 Валера! status
2 когда выйдет status
3 КАК ДЕЛА?!) status
ref: Error parsing JSON file with the jsonlite package