trasform csv to xml file genertaion using xslt programing - csv

I would like to generate xml file from an extisng csv file using xslt.
Can anybody tell me command to use.
I don't knwo the command to convert the file.
Suppose my csv file named :- source.csv
ouput template :- temp.xsl
command:-
msxsl source.csv temp.xsl -o result.xml
Is this the right command or not?

Here is a XSL file to convert CSV to XML: http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html
To run it from the command line, the instructions say to download Saxon and use:
java -cp saxon.jar net.sf.saxon.Transform -o output.xml -it main csv-to-xml.xslt pathToCSV=file:/C:/dev/test.csv
Here are the parts of that command line explained:
java The java executable (programming language in which Saxon is written
-cp saxon.jar saxon.jar contains the XSLT code, -cp stands for "classpath" and tells java how to find it
-o output.xml Where the output should go. That is result.xml in your example.
-it main csv-to-xml.xslt specify the xslt file (csv-to-xml.xslt) and the entry point within it (main)
pathToCSV=file:/C:/dev/test.csv your input csv file (source.csv in your example, but formatted as a url)

I do not have sufficient reputation to comment on Stephen's answer.
the transform as described is highly dependent on the XSL stylesheet which defines a parameter pathToCSV and a template with the identifier of "main"
the command will not work as Stephen has written it for version 9 of the Home Edition; when I attempt to run the command as written I get a response of "Command line option -o requires a value". However, this format of the command works as of the day of this posting:
java -cp saxon9he.jar net.sf.saxon.Transform -o:csvfile.xml -it:main "csv2xml.xsl" pathToCSV="csvfile.csv"
the linked xsl appears to be buggy (probably hasn't been maintained) and will not correctly transform all csv files (e.g. the csv example from Michael Kay's book). However, it is a good example from which to learn.

Related

I need to custom format a code analysis report generated by PMD

I want to edit the formatting of the html that is produced by this command:
C:\tmp\pmd-bin-5.1.0\pmd\bin>pmd -d c:\data\pmd\pmd\test-data\Unused1.java -f xml -R rulesets/java/unusedcode.xml
where it says "-f xml", I replaced xml with html and I want to edit the html code before it finalizes the page.
I don't know where the command is writing the code for the page but I want to at least change some of the parameters permanently so that it will generate the desired format every time.
If you know XSLT, then you can probably use the xslt renderer. That renderer outputs the report using the XML formatter, then processes it using a user-provided XSLT stylesheet. This may be used to produce a HTML page in any format you like for the report.
For example:
C:\tmp\pmd-bin-5.1.0\pmd\bin>pmd -d c:\data\pmd\pmd\test-data\Unused1.java ^
-R rulesets/java/unusedcode.xml ^
-f xslt ^
-P xsltFilename=pmd_report.xsl ^
-r report.html
where pmd_report.xslt is your stylesheet. An example for such a stylesheet can be found here, which is the default stylesheet if you don't provide an explicit xsltFilename.
The generated HTML file will be at the location report.html.

Including unicharambigs in the [lang].traineddata file (Tesseract)

I'm facing a problem in training the Tesseract OCR for Kannada font (Lohit Kannada and Kedage), when it comes to numerals.
For example, 0 is getting recognized as 8 (and ನ as ವ).
I needed help in including the unicharambigs file (the documentation on Github describes the format solely).My output.txt file has not changed,despite including the unicharambigs file.
Suppose [lang] corresponds to kan, will the following command include the unicharambigs file in the kan.traineddata file?
combine_tessdata kan.
Incase it doesn't, I'd appreciate any help regarding how to proceed with the same.
Difficult to answer not knowing which version of tesseract and kan.traineddata you're using.
You can unpack the kan.traineddata to see the version of kan.unicharabigs included in it and then recombine it after editing the file.
see https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc for command syntax
Use -u option to unpack:
-u .traineddata PATHPREFIX Unpacks the .traineddata using the provided prefix.
Use -o option to overwrite ucharambigs:
-o .traineddata FILE…​: Overwrites the specified components of the .traineddata file with those provided on the command line.
Please note that https://github.com/tesseract-ocr/langdata/blob/master/kan/kan.unicharambigs seems to be a copy of eng.unicharambigs

Best way to format large JSON file? (~30 mb)

I need to format a large JSON file for readability, but every resource I've found (mostly online) doesn't deal with data say, above 1-2 MB. I need to format about 30 MB. Is there any way to do this, or any way to code something to do this?
With python >= 2.6 you can do the following:
For Mac/Linux users:
cat ugly.json | python -mjson.tool > pretty.json
For Windows users (thanks to the comment from dnk.nitro):
type ugly.json | python -mjson.tool > pretty.json
jq can format or beautify a ~100MB JSON file in a few seconds:
jq '.' myLargeUnformattedFile.json > myLargeBeautifiedFile.json
The command above will beautify a single-line ~120MB file in ~10 seconds, and jq gives you a lot of json manipulation capabilities beyond simple formatting, see their tutorials.
jsonpps is the only one worked for me (https://github.com/bazaarvoice/jsonpps).
It doesn't load everything to RAM unlike jq, jsonpp and others that I tried.
Some useful tips regarding installation and usage:
Download url: https://repo1.maven.org/maven2/com/bazaarvoice/jsonpps/jsonpps/1.1/jsonpps-1.1.jar
Shortcut (for Windows):
Create file jsonpps.cmd in the same directory with the following content:
#echo off
java -Xms64m -Xmx64m -jar %~dp0\jsonpps-1.1.jar %*
Shortcut usage examples:
Format stdin to stdout:
echo { "x": 1 } | jsonpps
Format stdin to file
echo { "x": 1 } | jsonpps -o output.json
Format file to file:
jsonpps input.json -o output.json
Background-- I was trying to format a huge json file ~89mb on VS Code using the command (Alt+Shift+F) but the usuals, it crashed. I used jq to format my file and store it in another file.
A windows 11 use case is shown below.
step 1- download jq from the official site for your respective OS - https://stedolan.github.io/jq/
step 2- create a folder in the C drive named jq and paste the executable file that you downloaded into the folder. Rename the file as jq (Error1: beware the file is by default an exe file so do not save it as 'jq.exe' save it only as 'jq')
step 3- set your path variable to the URL of the executable file.
step 4- open your directory on cmd where the json file is stored and type the following command - jq . currentfilename.json > targetfilename.json
replace currentfilename with the file name that you want to format
replace targetfilename with the final file name that you want your data formatted in
within seconds you should see your target file in the same directory in a formatted version which can now be opened on VS Code or any editor for that matter. Any error related to the recognizability of jq as a command can be traced back with high probability to Error 1.
jq jquery json data-preprocessing data-cleaning
You can use Notepad++ (https://notepad-plus-plus.org/downloads/) for formatting large JSON files (tested in Windows).
Install Notepad++
Go to Plugins -> Plugins Admin -> Install the 'Json Viewer' plugin. The plugin source code is present in https://github.com/kapilratnani/JSON-Viewer
After plugin installation, go to Plugins -> JSON Viewer -> Format JSON.
This will format your JSON file

convert HTML to XHTML using TagSoup in bash

I was under the impression you can convert HTML to XHTML using TagSoup. I have the tagsoup jar file saved as tagsoup.jar I used the following command wget -O usa_stock.html "http://markets.usatoday.com/custom/usatoday-com/new/html-mktscreener.asp#" | java -jar tagsoup.jar usa_stock.html When I use this command, it generates both the html and xhtml file but when I open the xhtml in firefox it's empty. I'm suspecting that when I pipeline it just doesn't know which file I was trying to convert.
Can someone help me out with this one?
Thanks.
The pipeline (|) used in your code is wrong for sure, change it with && could possible solve your problem.
As the wget didn't output the retrieve webpage to stdout, so you piped nothing into tagsoup.
Although you also specified input file and output file for jsoup, you used pipeline. so at the time java-jar starts to execute, wget is still running. The input file you specified for tagsoup isn't ready yet.
So you need wget quit with 0 exit status first before jsoup start, && here will serve this purpose.

Generic way to apply an XSL to all files in a directory?

I have an XSL that transforms an XML file into a HTML file. Works great. But I would like to apply to a directory of files. Ideally a new HTML file for each XML file would be plunked down in the same directory.
I'm using Windows XP. I've got Cygwin, and am good enough with shell scripting. I've now got Saxon, but haven't been able to accomplish much with it so far. Right now I'm doing something like
java -jar settings.saxon_path -t -s:sourceFilepathNormal -xsl:normalizePath(myXSLT) -o:newXMLFilepathNormal
in a for loop on each file in the directory, but this seems hella clunky to me. Actually, doesn't seem that way, I know its clunky. What is the most elegant way you would accomplish this task with the tools at hand?
You can do this using the collection() function as suggested; but there's also a facility on the Saxon command line to process a whole directory. Just give a directory name as the value of the -s argument and another directory as the value of the -o argument.
If you prefer a GUI approach, KernowForSaxon also has the capability to apply the same transformation to every file in a folder.
You can do this easily in XSLT 2.0 using the standard XPath 2.0 function collection() and the XSLT 2.0 instruction <xsl:result-document>.
As the collection() function is only superficially defined in the W3C Spec, read the more Saxon-specific bits here:
And see for example my answer to this question.
Try:
find . -name *.xml -exec java -jar settings.saxon_path -t -s:{} -xsl:normalizePath(myXSLT) -o:{}.html \;