How can I use OSMOSIS to extract multipolygon data - extract

to update my database I need to extract some datas from OSM database. To do it I have downloaded the osm pbf file from the website of GEOFABRIK.
From this PBF file I wish to extract all the polygons and multipolygons with some key values.
My probleme is whatever command line I use with osmosis it seems that I miss the objects with the type multipolygon.
To illustrate my problem you will find the command I use for the object with the tag townhall in the keyword amenity.
To do that I have tried some command:
osmosis --read-xml nord-pas-de-calais-latest.osm --way-key-value keyValueList="amenity.townhall" --bounding-box top=50.794448377057 left=2.78824307841151 bottom=50.4995835747462 right=3.2719212468744 --used-node --write-xml #target_file.osm#
OR
osmosis --read-pbf nord-pas-de-calais-latest.osm.pbf --log-progress --tf accept-ways amenity=townhall --bounding-box top=50.794448377057 left=2.78824307841151 bottom=50.4995835747462 right=3.2719212468744 --used-node --write-xml extraction_test_townhall.osm
AND those commands lines
# read all nodes with amenity=townhall or townhall=yes, ignore ways and relations
osmosis --read-xml nord-pas-de-calais-latest.osm --tf accept-nodes amenity=townhall --tf reject-ways --tf reject-relations --write-xml amenity_townhall_nodes.osm
osmosis --read-xml nord-pas-de-calais-latest.osm --tf accept-nodes townhall=yes --tf reject-ways --tf reject-relations --write-xml townhall_yes_nodes.osm
# read all ways with amenity=townhall or townhall=yes, keep only related nodes, ignore relations
osmosis --read-xml nord-pas-de-calais-latest.osm --tf accept-ways amenity=townhall --used-node --tf reject-relations --write-xml amenity_townhall_ways.osm
osmosis --read-xml nord-pas-de-calais-latest.osm --tf accept-ways townhall=yes --used-node --tf reject-relations --write-xml townhall_yes_ways.osm
# read all relations with amenity=townhall or townhall=yes, keep only related ways and nodes
osmosis --read-xml nord-pas-de-calais-latest.osm
--tf accept-relations amenity=townhall --used-way --tf accept-relations townhall=yes --used-node --write-xml amenity_townhall_relations.osm
osmosis --read-xml nord-pas-de-calais-latest.osm --tf accept-relations townhall=yes --used-way --tf accept-relations townhall=yes --used-node --write-xml townhall_yes_relations.osm
# merge all files together
osmosis --rx amenity_townhall_nodes.osm --rx townhall_yes_nodes.osm --rx amenity_townhall_ways.osm --rx C:\Users\rjault\Documents\02_DEMANDE\13_OSM_ORACLE\OSMOSIS\townhall_yes_ways.osm --rx amenity_townhall_relations.osm --rx townhall_yes_relations.osm --merge --merge --merge --merge --merge --wx townhall.osm
It doesn't matter which command I use the result is always the same. You can see on the picture, In green the original dataset and in purple the townhalls I have extracted. The object surrounded in red for exemple is missing in my extraction.
I have also tried the solution brought by AKX but even there, I still miss the biggest townhall of my area. OSM seems to be a really nice database I wish to use it more.
So thank you for your help.

This helped but it keeps a selected set of points which contain both road and building data.
osmosis --rx file="input file" --tf accept-ways building=* geometry-type=multipolygon --wx "output file"

Related

Import XML file into database but escape some of the XML tags

I'm able to successfully import XML files into a table in my database with the following query:
LOAD XML
INFILE "myFileName.xml"
INTO TABLE t_orig2
ROWS IDENTIFIED BY '<verse>';
That works when my XML is structured like this:
<verse id="40001001"><b>40</b><c>1</c><v>1</v><t>Text content here</verse>
The result is that the data nicely inserts into my table which has these columns:
id, b, c, v, t
The problem is that now I need to insert some actual XML tags into the "t" column in the database and there the import fails. My question is this: How can I indicate that some parts of my XML should not go into their own columns (but it should be interpreted as plain text instead)?
Here's an example of the problematic XML:
<verse id="40001001"><b>40</b><c>1</c><v>1</v><t>
<w pos="N-" morph="----NSF-" lemma="βίβλος" strongs="00976">Βίβλος</w> <w pos="N-" morph="----GSF-" lemma="γένεσις" strongs="01078">γενέσεως</w> <w pos="N-" morph="----GSM-" lemma="Ἰησοῦς" strongs="02424">Ἰησοῦ</w> <w pos="N-" morph="----GSM-" lemma="Χριστός" strongs="05547">χριστοῦ</w> <w pos="N-" morph="----GSM-" lemma="υἱός" strongs="05207">υἱοῦ</w> <w pos="N-" morph="----GSM-" lemma="Δαυίδ">Δαυὶδ</w> <w pos="N-" morph="----GSM-" lemma="υἱός" strongs="05207">υἱοῦ</w> <w pos="N-" morph="----GSM-" lemma="Ἀβραάμ" strongs="00011">Ἀβραάμ</w>.
</t></verse>
I'm after the end result where "id","b","c" and "v" go into their own columns in the database (which works nicely) but then everything that is inside the "t" tag should be put into the "t" column in the database as one long string.
How should I escape the XML inside the <t></t> tags so that the importer will insert it into the "t" column as one long string?
I'm not familiar with loading XML directly, as XML instead of as text/blob, into a database. However, converting those lines would not have to be so difficult. Here is a Perl script that would change all of the <t> and </t> tags to <t> and </t> respectively--and you could adjust the script to change other aspects as well.
#!/usr/bin/perl
use strict;
use warnings;
#IF THESE NEXT TWO LINES RAISE ERRORS, YOU COULD TRY
#JUST COMMENTING THEM OUT (PUT A '#' IN FRONT OF THEM)
#THEY MAY NOT BE NEEDED, DEPENDING ON YOUR PERL SETUP
use feature 'unicode_strings';
use open ':encoding(utf8)'; # deal with all files in a UTF8 way
#https://perldoc.perl.org/perlunifaq.html
binmode STDOUT, ':utf8';
my #data = ();
my $sourcefile = 'source_file.txt'; #FILE TO BE READ
my $targetfile = 'target_file.txt'; #IF EXISTS, THIS FILE WILL BE OVERWRITTEN!
my $longline = '';
#READ THE SOURCE FILE INTO MEMORY
open SOURCE, "<$sourcefile" or die "Cannot open $sourcefile $!\n";
#data = <SOURCE>;
close SOURCE;
print "There are ".scalar #data." lines to process in the file.\n";
#PROCESS THE SOURCE FILE ONE LINE AT A TIME
foreach my $line (#data) {
$longline .= $line;
}
#REPLACE CARRIAGE RETURNS WITH SPACES
$longline =~ s/\n|\r/ /g;
$longline =~ s~
<verse\sid="([^"]+)">
<b>([^<]+)</b>
<c>([^<]+)</c>
<v>([^<]+)</v>
<t>(.*?)</t>
</verse>
~<verse id="$1"><b>$2</b><c>$3</c><v>$4</v>\<t>$5\</t></verse>~xg;
#THE /x FLAG IS REQUIRED TO IGNORE MOST WHITESPACE,
#MAKING THE ABOVE MORE READABLE. THE /g MAKES THE
#REPLACEMENT "GLOBAL", AND IF THE FILE IS ALL ON ONE
#LINE, IT MAY STILL DO ALL OF THE REQUIRED SUBSTITUTIONS.
#ADD CARRIAGE RETURNS BACK IN FOLLOWING </verse> TAGS
#IF ON WINDOWS, YOU MAY NEED \r\n INSTEAD OF \n.
$longline =~ s~(</verse>)\s*~$1\n~g;
open TARGET, ">$targetfile" or die "Cannot open $targetfile. $!\n";
print TARGET $longline;
close TARGET;
print "Script completed.\n";
print "You should now have one verse per line in $targetfile.\n\n";
exit;
To run the script on a computer with perl installed, just save it to a filename, e.g. "xml_fix_script.pl", and run it like this:
perl xml_fix_script.pl
Make sure your source file is in the same directory as the script--and it's always smart to keep a backup first, just in case.

Delete specific words from line in mysql log

I have a log file like this
15-31-57.175 [4359] [TRACE] ThreadUpdateDb::insertUpdate() id[1791145] sqlquery[INSERT INTO DATAMART_QUEUE_DETAILS(EVENT_TIME,QUEUE,EVENT,TRACKNUM,QUEUE_TYPE,ASSOCIATED_ROUTING_SCRIPT,ANI,DNIS,CHANNEL_TYPE) VALUES ('2021.01.01 15:31:57','Electrolux', 'abandoned', '1609507904.86839', 'ROUTING_SCRIPT', 'Electrolux', '01008367900', '886123', 'CALL')]
15-31-59.104 [4361] [TRACE] ThreadUpdateDb::insertUpdate() id[1791170] sqlquery[INSERT INTO DATAMART_QUEUE_DETAILS(EVENT_TIME,QUEUE,EVENT,TRACKNUM,QUEUE_TYPE,ASSOCIATED_ROUTING_SCRIPT,ANI,DNIS,CHANNEL_TYPE) VALUES ('2021.01.01 15:31:59','Electrolux-Inst', 'queued', '1609507878.86832', 'VIRTUAL', 'Electrolux', '01552050703', '886123', 'CALL')]
I need a linux command to delete all words before 'sqlquery' to be like this
[INSERT INTO DATAMART_QUEUE_DETAILS(EVENT_TIME,QUEUE,EVENT,TRACKNUM,QUEUE_TYPE,ASSOCIATED_ROUTING_SCRIPT,ANI,DNIS,CHANNEL_TYPE) VALUES ('2021.01.01 15:31:57','Electrolux', 'abandoned', '1609507904.86839', 'ROUTING_SCRIPT', 'Electrolux', '01008367900', '886123', 'CALL')]
[INSERT INTO DATAMART_QUEUE_DETAILS(EVENT_TIME,QUEUE,EVENT,TRACKNUM,QUEUE_TYPE,ASSOCIATED_ROUTING_SCRIPT,ANI,DNIS,CHANNEL_TYPE) VALUES ('2021.01.01 15:31:59','Electrolux-Inst', 'queued', '1609507878.86832', 'VIRTUAL', 'Electrolux', '01552050703', '886123', 'CALL')]
Use this Perl one-liner:
perl -pe 's{.*sqlquery}{}' in_file > out_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
Or use GNU grep:
grep -Po 'sqlquery\K.*' in_file > out_file
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
\K : Cause the regex engine to "keep" everything it had matched prior to the \K and not include it in the match. Specifically, ignore the preceding part of the regex when printing the match.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
grep manual
perlre - Perl regular expressions

How to infer a schema from a "reference file" and apply it as a reference to files to be read in?

I have tons of csv-Files to read into Spark (Databricks) with 100+ columns. I do not want to specify the schema manually and have thought of using the following way. Read in a "reference" csv File, get the schema from this file and apply it as "reference_schema" to all other files I need to read in. Code would look as follows (but I cannot get it to work).
# File location and type
file_location = "/FileStore/tables/reference_file_with_ok_schema.csv"
file_type = "csv"
# CSV options
infer_schema = "True"
first_row_is_header = "True"
delimiter = ";"
df = spark.read.format(file_type) \
.option("inferSchema", infer_schema) \
.option("header", first_row_is_header) \
.option("sep", delimiter) \
.load(file_location)
mySchema = df.schema ###this is probably where I go wrong
display(df)
Next I would apply mySchema as the reference Schema for new csv's like in the following example:
# File location and type
file_location = "/FileStore/tables/all_other_files.csv"
file_type = "csv"
# CSV options
first_row_is_header = "True"
delimiter = ";"
# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
.schema(mySchema) \
.option("header", first_row_is_header) \
.option("sep", delimiter) \
.load(file_location)
display(df)
This only produces nulls
Thanks in advance for your help and
best regards
Alex
You have the right approach.
You can check these two options mode and columnNameOfCorruptRecord. By default, mode=PERMISSIVE which creates NULL records when line does not match schema.
That is probably why you have NULL records in you dataframe, it means the schema mySchema and the schema of file all_other_files are different.
The first thing to check is to infer the schema of all_other_files and compare it with mySchema. To do that easily, schema object have a json method which output them as JSON string. It is easier for human to compare two jsons than 2 schema objects.
mySchema.json()
If there is just one difference, the whole line will be set to NULL unfortunately.

tDOM not parsing the XML

I have this XML sample template from a command line output,
<config xmlns="http://tail-f.com/ns/config/1.0">
<random xmlns="http://random.com/ns/random/config">
<junk-id>1</junk-id>
<junk-ip-address>1.2.2.3</junk-ip-address>
<junk-state>success</junk-state>
<junk-rcvd>158558</junk-rcvd>
<junk-sent>158520</junk-sent>
<foobar>
<id1>1</id1>
<id2>1</id2>
</foobar>
</random>
</config>
I need to extract the value of junk-state from this XML.
I made a .tcl script to run on this with a variable and using single quotes just for testing purposes as below,
Below are contents of my script. I just tried looping around the nodes, but with no success.
set XML "<config xmlns='http://tail-f.com/ns/config/1.0'>
<random xmlns='http://random.com/ns/random/config'>
<junk-id>1</junk-id>
<junk-ip-address>1.2.2.3</junk-ip-address>
<junk-state>success</junk-state>
<junk-rcvd>158558</junk-rcvd>
<junk-sent>158520</junk-sent>
<foobar>
<id1>1</id1>
<id2>1</id2>
</foobar>
</random>
</config>"
set doc [dom parse $XML]
set root [$doc documentElement]
set mynode [$root selectNodes "/config/random" ]
foreach node $mynode{
set temp1 [$node text]
echo "temp1 - $temp1"
}
The above script produces no output,
Also tried a direct xpath expression as below and print text
set node [$root selectNodes /config/random/junk-state/text()]
puts [$node nodeValue]
puts [$node data
and this produces an error
invalid command name ""
while executing
"$node nodeValue"
invoked from within
"puts [$node nodeValue]"
(file "temp.tcl" line 41)
What am I doing wrong here. Like to know how use/modify my xpath expression, since I find that neater.
$ tclsh
% puts $tcl_version
8.5
% package require tdom
0.8.3
The problems are due to the XML namespaces (xmlns attributes in the config and random elements). You must use the -namespace option of selectNodes operation:
package require tdom
set XML {<config xmlns="http://tail-f.com/ns/config/1.0">
<random xmlns="http://random.com/ns/random/config">
<junk-id>1</junk-id>
<junk-ip-address>1.2.2.3</junk-ip-address>
<junk-state>success</junk-state>
<junk-rcvd>158558</junk-rcvd>
<junk-sent>158520</junk-sent>
<foobar>
<id1>1</id1>
<id2>1</id2>
</foobar>
</random>
</config>}
set doc [dom parse $XML]
set root [$doc documentElement]
set node [$root selectNodes -namespace {x http://random.com/ns/random/config} x:random/x:junk-state ]
puts [$node text]
EDIT: If you want the namespace of the <random> element to be retrieved from the XML automatically you can do it as follows (assuming that <random> is the only child of the root element):
set doc [dom parse $XML]
set root [$doc documentElement]
set random [$root childNode]
set ns [$random namespace]
set node [$random selectNodes -namespace [list x $ns] x:junk-state]
puts [$node text]

How do I encode a simple array into JSON in Perl?

All the examples that I've seen of encoding objects to JSON strings in Perl have involved hashes. How do I encode a simple array to a JSON string?
use strict;
use warnings;
use JSON;
my #arr = ("this", "is", "my", "array");
my $json_str = encode_json(#arr); # This doesn't work, produced "arrayref expected"
# $json_str should be ["this", "is", "my", "array"]
If you run that code, you should get the following error:
hash- or arrayref expected (not a simple scalar, use allow_nonref to allow this)
You simply need to pass a reference to your \#arr
use strict;
use warnings;
use JSON;
my #arr = ("this", "is", "my", "array");
my $json_str = encode_json(\#arr); # This will work now
print "$json_str";
Outputs
["this","is","my","array"]