Bash shell file import XML to MySQL [closed] - mysql

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed yesterday.
Improve this question
I have created a shell script for importing XML to MySQL.
This is the code:
echo "Start Download... please wait" read -r -d '' query <<EOF TRUNCATE TABLE 1000farmacie_scontrini;
LOAD XML LOCAL INFILE 'scontrini.xml'
INTO TABLE 1000farmacie_scontrini
ROWS IDENTIFIED BY '<order>';
EOF mysql --user=$DB_USER --password=$DB_PASS --database=$DB_SCHE --execute="$query" duration=$SECONDS echo "Update xxxx completed in $(($duration / 60)) minutes and $(($duration % 60)) seconds."
But it does not import all rows.
This is the XML structure:
<order>
<number>F142425268</number>
<pharmacy_number>6309</pharmacy_number>
<state>consegnato</state>
<total>39.41</total>
<total_paid>39.41</total_paid>
<shipping_cost>0</shipping_cost>
<contrassegno>0.0</contrassegno>
<total_without_discount>42.38</total_without_discount>
<discount>2.97</discount>
<date>2022-12-10 22:30:09 +0100</date>
<customer>
<name>xxx</name>
<indirizzo>xxx</indirizzo>
<cap>00199</cap>
<email_hash>xxx</email_hash>
</customer>
<invoice>
<invoice_type>riepilogo</invoice_type>
<invoice_type_description>Riepilogo non detraibile</invoice_type_description>
<invoice_fiscal_code/>
<invoice_customer_name/>
<invoice_email/>
<invoice_sdi/>
<invoice_full_address/>
<invoice_street/>
<invoice_house_number/>
<invoice_city/>
<invoice_zip/>
<invoice_district/>
<invoice_country>Italia</invoice_country>
</invoice>
<products>
<product>
<minsan>938810963</minsan>
<quantity>1</quantity>
<price>12.43</price>
<list_price>13.37</list_price>
</product>
<product>
<minsan>937362414</minsan>
<quantity>1</quantity>
<price>6.47</price>
<list_price>6.96</list_price>
</product>
<product>
<minsan>976024935</minsan>
<quantity>1</quantity>
<price>12.82</price>
<list_price>13.78</list_price>
</product>
<product>
<minsan>981212640</minsan>
<quantity>1</quantity>
<price>7.69</price>
<list_price>8.27</list_price>
</product>
</products>
<accept_link>xxx</accept_link>
<ldv_file>xxx</ldv_file>
</order>

Related

How can I list out the items of a JSON array embedded in a XML response? [duplicate]

This question already has answers here:
Extracting data from xml file using xmllint
(2 answers)
Extract all fields from array with jq
(1 answer)
Closed 9 months ago.
I executed the curl command in Linux, how can I list out the content of get_ccws_supportlist_t_json_T2Result one by one?
Command:
curl --request POST --header "Content-Type: text/xml;charset=UTF-8" -d #request.xml http://server1:5353/CCWS_IOC_interface.asmx | xmllint --format
Response:
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org /2001/XMLSchema">
<soap:Body>
<get_ccws_supportlist_t_json_Response xmlns="http://tempuri.org/">
<get_ccws_supportlist_t_json_T2Result>[{"Sup_Team":"Team1","Sup_Project":"ID","Sup_Host":"host1","InstCluster":"loc1","SupSeq_id":"16186","SupSeq_Mode":"1","SupSeq_Seq":"1","SupSeq_Remark1":"Please call the support number directly instead of sending SMS.","SupSeq_Remark2":"","SupSeq_Remark3":"","Host_Location":"LOC1","Sup_ID":"1449","User_Team":"Team1","User_DomainID":"support1","User_Name":"Primary Production Support","User_Title":"T","Sup_InterfaceWith":"","Sup_Group":"2","SupGroup_Name":"Support"},{"Sup_Team":"T2","Sup_Project":"Team2","Sup_Host":"host2","InstCluster":"loc2","SupSeq_id":"813514","SupSeq_Mode":"0","SupSeq_Seq":"2","SupSeq_Remark1":"","SupSeq_Remark2":"","SupSeq_Remark3":"", "Host_Location":"LOC2","Sup_ID":"2272","User_Team":"Team2","User_DomainID":"support2","User_Name":"Secondary","User_Title":"""Sup_InterfaceWith":"","Sup_Group":"2","SupGroup_Name":"Support"}]</get_ccws_supportlist_t_json_T2Result>
</get_ccws_supportlist_t_json_Response>
</soap:Body>
</soap:Envelope>

Bash script: Download html file, compare with previous version and display the difference

This website lists a few IP addresses which might change over the time.
https://support.symantec.com/en_US/article.TECH244698.html
Now I would like to write a bash script (must be bash, python or PHP can't be used) which would download the above mentioned html file once and then compare the new version towards the old one each time the script is run. If there is a difference, it should be displayed, logged to a file and, in a later step, an email notification should be sent.
Now, this seems to be an easy task in theory, but I am unable to produce any results. I would appreciate if I can get some ideas on how to achieve this.
So far I have tried the following approaches:
#!/bin/bash
#check website for changes
URL="https://support.symantec.com/en_US/article.TECH244698.html"
mv new.html old.html 2> /dev/null
curl -v --silent $URL --stderr - > new.html
diff -y --suppress-common-lines new.html old.html
And
URL="https://support.symantec.com/en_US/article.TECH244698.html"
for (( ; ; )); do
mv new.html old.html 2> /dev/null
curl $URL -L --compressed -s > new.html
DIFF_OUTPUT="$(diff new.html old.html)"
if [ "0" != "${#DIFF_OUTPUT}" ]; then
... ...
You can use the following bash script:
#!/bin/bash
#check website for changes
URL="https://support.symantec.com/en_US/article.TECH244698.html"
if [ -f new_ips.log ]; then
mv new_ips.log old_ips.log 2> /dev/null
fi
curl --silent "$URL" | \
grep -oP '\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(:?\/\d\d?)?\b' > new_ips.log
if [ -f new_ips.log ] && [ -f old_ips.log ]; then
diff -y --suppress-common-lines new_ips.log old_ips.log
exit 0;
fi
exit 1;
The first time you need to run it twice, since initially there is nothing to compare to.
Explanations:
You were really close to a working solution, by adding the grep -oP '\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(:?\/\d\d?)?\b' that will fetch only the ips of the html file you can focus on what you are really interested into and avoid having differences in the page design or timestamp etc.
I have tested it by modifying the new_ips.log before rerunning it and you have the following differences:
./check_ips.sh
> 142.64.0.0/21
148.64.0.0/21 | 142.64.0.1
148.64.0.1 <
To answer to your second question in which you want to add the list of countries to your IP addresses. Like Auckland, NZ IP address, Sydney, AU IP address.
I propose to change the extraction/filtering command in the following way:
1) we will need to parse the HTML document using a parser (xslt technologies are enough for what we want to achieve)
more generate_ips.xslt
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each select="/html/body/table/tbody/tr/td[1]/strong|/html/body/table/tbody/tr[2]/td">
<xsl:value-of select="."/><xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
This stylesheet will print the country name and the ips with a line feed character after them by accessing them via their xpath.
2) The curl command is changed in the following way:
curl --silent "$URL" |\
awk 'BEGIN{print "<html><body>"}/<table/{a=1;}/<\/table>/{print;a=0}{if(a)print;}END{print "</body></html>"}' |\
xsltproc -html generate_ips.xslt - | sed '/^Egress/{d};s/^ *//'
You use awk to extract all the tables and create a simplified html file with only the ip tables we are interested into. Then you call the XSLT processor to produce the output and last but not least sed will remove lines that are not required and clean the display
OUTPUT:
IP address range**
148.64.0.0/21**
148.64.0.0/21
148.64.0.1
148.64.7.254
Auckland, New Zealand
124.157.113.128/27 124.157.113.160/27 124.157.113.192/27
124.157.113.129 124.157.113.161 124.157.113.193
124.157.113.158 124.157.113.190 124.154.113.222
Chennai, India
180.179.40.0/26 180.179.46.64/27 148.64.6.0/23
180.179.40.1 180.179.46.65 148.64.6.1
180.179.40.62 180.179.46.94 148.64.7.254
Hong Kong
103.246.38.0/24 148.64.0.0/24
103.246.38.1 148.64.0.1
103.246.38.254 148.64.0.254
Mumbai, India
180.179.142.0/24 148.64.4.0/23
180.179.142.1 148.64.4.1
180.179.142.254 148.64.5.254
Seoul, South Korea
203.246.168.0/24
203.246.168.1
203.246.168.254
Shanghai, China
211.147.76.0/27 211.147.76.32/27
211.147.76.1 211.147.76.33
211.147.76.30 211.147.76.62
Singapore
103.246.37.0/24 148.64.3.0/24
103.246.37.1 148.64.3.1
103.246.37.254 148.64.3.254
Sydney, Australia
103.246.36.0/24
103.246.36.1
103.246.36.254
Taipei, Taiwan
61.58.46.0/24
61.58.46.1
61.58.46.254
Tokyo, Japan
103.9.99.0/24 103.246.39.0/24 148.64.1.0/24
103.9.99.1 103.246.39.1 148.64.1.1
103.9.99.254 103.246.39.254 148.64.1.254
The rest of the script should not be changed and works well as it is.

xsltproc html documents

I'm trying to clean some htmls. I have converted them to xhtml with tidy
$ tidy -asxml -i -w 150 -o o.xml index.html
The resulting xhtml ends up having named entities.
When trying xsltproc on those xhtmls, I keep getting errors.
$ xsltproc --novalid -o out.htm t.xsl o.xml
o.xml:873: parser error : Entity 'mdash' not defined
resources to storing data and using permissions — as needed.</
^
o.xml:914: parser error : Entity 'uarr' not defined
</div>↑ Go to top
^
o.xml:924: parser error : Entity 'nbsp' not defined
Android 3.2 r1 - 27 Jul 2011 12:18
If I add --html to the xsltproc it complains on a tag that has name and id attributes with same name (which is valid)
$ xsltproc --novalid --html -o out.htm t.xsl o.xml o.xml:845: element a: validity error : ID top already defined
<a name="top" id="top"></a>
^
The xslt is simple:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes" omit-xml-declaration="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="//*[#id=side-nav]"/>
</xsl:stylesheet>
Why doesn't --html work? Why is it complaining? Or should I forget it and fix the entities?
I did the other way - made tidy produce numeric entities rather then named with -n option.
$ tidy -asxml -i -n -w 150 -o o.xml index.xml
Now I can remove --html option and it works.
Although I can remove that name attribute, but still wonder why it is reported as an error, although it is valid
I am assuming that the unclearly stated question is this: I know how to avoid "Entity 'XXX' not defined" errors when running xsltproc (add --html). But how do I get rid of "ID YYY already defined"?
Recent builds of Tidy have an anchor-as-name option. You can set it to "no" to remove unwanted name attributes:
This option controls the deletion or addition of the name attribute in elements where it can serve as anchor. If set to "yes", a name attribute, if not already existing, is added along an existing id attribute if the DTD allows it. If set to "no", any existing name attribute is removed if an id attribute exists or has been added.

How to get CTest results in Hudson / Jenkins

I'm using CTest (part of CMake) for my automated tests.
How do I get CTest results in the Jenkins dashboard ? Or, phrased differently, how do I get CTest to output in JUnit-like XML ?
In Jenkins, after the CMake part (probably made through the CMake plugin), add the following batch script, or adapt for builds on Linux :
del build_32\JUnitTestResults.xml
pushd build_32\Tests
"C:\Program Files\CMake 2.8\bin\ctest.exe" -T Test -C RelWithDebInfo --output-on-failure
popd
verify >nul
C:\Python27\python.exe external/tool/CTest2JUnit.py build_32/Tests external/tool/CTest2JUnit.xsl > build_32/JUnitTestResults.xml
build_32 is the Build Directory in the CMake plugin
Tests is the subdirectory where all my tests live
-T Test makes CTest output in XML (?!)
verify >nul resets errorlevel to 0, because CTest returns >0 if any test fails, which Jenkins interprets as "the whole build failed", which we don't want
The last line converts CTest's XML into a minimal JUnit xml. The Python script and the xslt live in the source directory, you may want to change that.
The python script looks like this (hacked together in 10 min, beware) :
from lxml import etree
import StringIO
import sys
TAGfile = open(sys.argv[1]+"/Testing/TAG", 'r')
dirname = TAGfile.readline().strip()
xmlfile = open(sys.argv[1]+"/Testing/"+dirname+"/Test.xml", 'r')
xslfile = open(sys.argv[2], 'r')
xmlcontent = xmlfile.read()
xslcontent = xslfile.read()
xmldoc = etree.parse(StringIO.StringIO(xmlcontent))
xslt_root = etree.XML(xslcontent)
transform = etree.XSLT(xslt_root)
result_tree = transform(xmldoc)
print(result_tree)
It needs lxml, direct link
It takes two arguments, the directory in which the tests live (in the build directory), and a xsl file
It simply reads the last xml tests results, transforms it with the xsl, and outputs it to stdout
The "last xml tests" are present in the first line of the Testing/TAG file, hence the additional fopen
The xsl looks like this. It's pretty minimal but gets the job done : [EDIT] see MOnsDaR 's improved version : http://pastebin.com/3mQ2ZQfa
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/Site/Testing">
<testsuite>
<xsl:apply-templates select="Test"/>
</testsuite>
</xsl:template>
<xsl:template match="Test">
<xsl:variable name="testcasename"><xsl:value-of select= "Name"/></xsl:variable>
<xsl:variable name="testcaseclassname"><xsl:value-of select= "FullName"/></xsl:variable>
<testcase name="{$testcasename}" classname="{$testcaseclassname}">
<xsl:if test="#Status = 'passed'">
</xsl:if>
<xsl:if test="#Status = 'failed'">
<error type="error"><xsl:value-of select="Results/Measurement/Value/text()" /></error>
</xsl:if>
<xsl:if test="#Status = 'notrun'">
<skipped><xsl:value-of select="Results/Measurement/Value/text()" /></skipped>
</xsl:if>
</testcase>
</xsl:template>
</xsl:stylesheet>
Finally, check "Publish JUnit tests results" (or similar, my version is in French) and set the xml path to build_32/JUnitTestResults.xml
Well, that was ugly. But still, hope this helps someone. And improvements are welcome ( running ctest from python maybe ? Using the path of the Python plugin instead of C:... ? )
This seems to be integrated in jenkins-ci nowadays:
https://github.com/jenkinsci/xunit-plugin/commits/master/src/main/resources/org/jenkinsci/plugins/xunit/types/ctest-to-junit.xsl

XML output from MySQL query

is there any chance of getting the output from a MySQL query directly to XML?
Im referring to something like MSSQL has with SQL-XML plugin, for example:
SELECT * FROM table WHERE 1 FOR XML AUTO
returns text (or xml data type in MSSQL to be precise) which contains an XML markup structure generated
according to the columns in the table.
With SQL-XML there is also an option of explicitly defining the output XML structure like this:
SELECT
1 AS tag,
NULL AS parent,
emp_id AS [employee!1!emp_id],
cust_id AS [customer!2!cust_id],
region AS [customer!2!region]
FROM table
FOR XML EXPLICIT
which generates an XML code as follows:
<employee emp_id='129'>
<customer cust_id='107' region='Eastern'/>
</employee>
Do you have any clues how to achieve this in MySQL?
Thanks in advance for your answers.
The mysql command can output XML directly, using the --xml option, which is available at least as far back as MySql 4.1.
However, this doesn't allow you to customize the structure of the XML output. It will output something like this:
<?xml version="1.0"?>
<resultset statement="SELECT * FROM orders" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<field name="emp_id">129</field>
<field name="cust_id">107</field>
<field name="region">Eastern</field>
</row>
</resultset>
And you want:
<?xml version="1.0"?>
<orders>
<employee emp_id="129">
<customer cust_id="107" region="Eastern"/>
</employee>
</orders>
The transformation can be done with XSLT using a script like this:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="resultset">
<orders>
<xsl:apply-templates/>
</orders>
</xsl:template>
<xsl:template match="row">
<employee emp_id="{field[#name='emp_id']}">
<customer
cust_id="{field[#name='cust_id']}"
region="{field[#name='region']}"/>
</employee>
</xsl:template>
</xsl:stylesheet>
This is obviously way more verbose than the concise MSSQL syntax, but on the other hand it is a lot more powerful and can do all sorts of things that wouldn't be possible in MSSQL.
If you use a command-line XSLT processor such as xsltproc or saxon, you can pipe the output of mysql directly into the XSLT program. For example:
mysql -e 'select * from table' -X database | xsltproc script.xsl -
Using XML with MySQL seems to be a good place to start with various different ways to get from MySQL query to XML.
From the article:
use strict;
use DBI;
use XML::Generator::DBI;
use XML::Handler::YAWriter;
my $dbh = DBI->connect ("DBI:mysql:test",
"testuser", "testpass",
{ RaiseError => 1, PrintError => 0});
my $out = XML::Handler::YAWriter->new (AsFile => "-");
my $gen = XML::Generator::DBI->new (
Handler => $out,
dbh => $dbh
);
$gen->execute ("SELECT name, category FROM animal");
$dbh->disconnect ();
Do you have any clue how to achieve this in MySQL?
Yes, go by foot and make the xml yourself with CONCAT strings. Try
SELECT concat('<orders><employee emp_id="', emp_id, '"><customer cust_id="', cust_id, '" region="', region, '"/></employee></orders>') FROM table
I took this from a 2009 answer How to convert a MySQL DB to XML? and it still seems to work. Not very handy, and if you have large trees per item, they will all be in one concatenated value of the root item, but it works, see this test with dummies:
SELECT concat('<orders><employee emp_id="', 1, '"><customer cust_id="', 2, '" region="', 3, '"/></employee></orders>') FROM DUAL
gives
<orders><employee emp_id="1"><customer cust_id="2" region="3"/></employee></orders>
With "manual coding" you can get to this structure.
<?xml version="1.0"?>
<orders>
<employee emp_id="1">
<customer cust_id="2" region="3" />
</employee>
</orders>
I checked this with a larger tree per root item and it worked, but I had to run an additional Python code on it to get rid of the too many openings and closings generated when you have medium level nodes in an xml path. It is possible using backward-looking lists together with entries in a temporary set, and I got it done, but an object oriented way would be more professional. I just coded to drop the last x items from the list as soon as a new head item was found, and some other tricks for nested branches. Worked.
I puzzled out a Regex that found each text between tags:
string = " <some tag><another tag>test string<another tag></some tag>"
pattern = r'(?:^\s*)?(?:(?:<[^\/]*?)>)?(.*?)?(?:(?:<\/[^>]*)>)?'
p = re.compile(pattern)
val = r''.join(p.findall(string))
val_escaped = escape(val)
if val_escaped != val:
string.replace(val, val_escaped)
This Regex helps you to access the text between the tags. If you are allowed to use CDATA, it is easiest to use that everywhere. Just make the content "CDATA" (character data) already in MySQL:
<Title><![CDATA[', t.title, ']]></Title>
And you will not have any issues anymore except for very strange characters like (U+001A) which you should replace already in MySQL. You then do not need to care for escaping and replacing the rest of the special characters at all. Worked for me on a 1 Mio. lines xml file with heavy use of special characters.
Yet: you should validate the file against the needed xml schema file using Python's module xmlschema. It will alert you when you are not allowed to use that CDATA trick.
If you need a fully UTF-8 formatted content without CDATA, which might often be the task, you can reach that even in a 1 Mio lines file by validating the code output (= xml output) step by step against the xml schema file (xsd that is the aim). It is a bit fiddly work, but it can be done with some patience.
Replacements are possible with:
MySQL using replace()
Python using string.replace()
Python using Regex replace (though I did not need it in the end, it would look like: re.sub(re.escape(val), 'xyz', i))
string.encode(encoding = 'UTF-8', errors = 'strict')
Mind that encoding as utf-8 is the most powerful step, it could even put aside all three other replacement ways above. Mind also: It makes the text binary, you then need to treat it as binary b'...' and you can thus write it to a file only in binary mode using wb.
As the end of it all, you may open the XML output in a normal browser like Firefox for a final check and watch the XML at work. Or check it in vscode/codium with an xml Extension. But these checks are not needed, in my case the xmlschema module has shown everything very well. Mind also that vscode/codium can can handle xml problems quite easily and still show a tree when Firefox cannot, therefore, you will need a validator or a browser to see all xml errors.
Quite a huge project could be done using this xml-building-with-mysql, at the end there was a triple nested xml tree with many repeating tags inside parent nodes, all made from a two-dimensional MySQL output.