I am trying to run the following code in a command window. The code executes, but it gives me no values in the .SHP files. The table has GeographyCollections and Polygons stored in a Field of type Geography. I have tried many variations for the Geography type in the sql statement - Binary, Text etc. but no luck. The output .DBF file has data, so the connection to the database works, but the shape .Shp file and .shx file has no data and is of size 17K and 11 K, respectively.
Any suggestions?
ogr2ogr -f "ESRI Shapefile" -overwrite c:\temp -nln Zip_States -sql "SELECT [ID2],[STATEFP10],[ZCTA5CE10],GEOMETRY::STGeomFromWKB([Geography].STAsBinary(),4326).STAsText() AS [Geography] FROM [GeoSpatial].[dbo].[us_State_Illinois_2010]" ODBC:dbo/GeoSpatial#PPDULCL708504
ESRI Shapefiles can contain only a single type of geometry - Point, LineString, Polygon etc.
Your description suggests that your query returns multiple types of geometry, so restrict that first (using STGeometryType() == 'POLYGON', for example).
Secondly, you're currently returning the spatial field as a text string using STAsText(), but you're not telling OGR that it's a spatial field so it's probably just treating the WKT as a regular text column and adding it as an attribute to the dbf file.
To tell OGR which column contains your spatial information you can add the "Tables" parameter to the connection string. However, there's no reason to do all the casting from WKT/WKB if you're using SQL Server 2008 - OGR2OGR will load SQL Server's native binary format fine.
Are you actually using SQL Server 2008, or Denali? Because the serialisation format changed, and OGR2OGR can't read the new format. So, in that case it's safer (but slower) to convert to WKB first.
The following works for me to dump a table of polygons from SQL Server to Shapefile:
ogr2ogr -f "ESRI Shapefile" -overwrite c:\temp -nln Zip_States -sql "SELECT ID, geom26986.STAsBinary() FROM [Spatial].[dbo].[OUTLINE25K_POLY]" "MSSQL:server=.\DENALICTP3;database=Spatial;trusted_connection=yes;Tables=dbo.OUTLINE25K_POLY(geom26986)"
Try the following command
ogr2ogr shapeFileName.shp -overwrite -sql "select top 10 * from schema.table" "MSSQL:Server=serverIP;Database=dbname;Uid=userid;trusted_connection=no;Pwd=password" -s_srs EPSG:4326 -t_srs EPSG:4326
Related
I use Stata 12.
I want to add some country code identifiers from file df_all_cities.csv onto my working data.
However, this line of code:
merge 1:1 city country using "df_all_cities.csv", nogen keep(1 3)
Gives me the error:
. run "/var/folders/jg/k6r503pd64bf15kcf394w5mr0000gn/T//SD44694.000000"
file df_all_cities.csv not Stata format
r(610);
This is an attempted solution to my previous problem of the file being a dta file not working on this version of Stata, so I used R to convert it to .csv, but that also doesn't work. I assume it's because the command itself "using" doesn't work with csv files, but how would I write it instead?
Your intuition is right. The command merge cannot read a .csv file directly. (using is technically not a command here, it is a common syntax tag indicating a file path follows.)
You need to read the .csv file with the command insheet. You can use it like this.
* Preserve saves a snapshot of your data which is brought back at "restore"
preserve
* Read the csv file. clear can safely be used as data is preserved
insheet using "df_all_cities.csv", clear
* Create a tempfile where the data can be saved in .dta format
tempfile country_codes
save `country_codes'
* Bring back into working memory the snapshot saved at "preserve"
restore
* Merge your country codes from the tempfile to the data now back in working memory
merge 1:1 city country using `country_codes', nogen keep(1 3)
See how insheet is also using using and this command accepts .csv files.
I have serval postgis tables that are converted from MIF/MID files, and I made some data processing on them.
I used ogr2org to convert MIF/MID to postgis tables,
ogr2ogr -f PostgreSQL PG:"<dbconn>" "xxx.mif"
but how can I convert the tables to MIF/MID?
according to https://www.gdal.org/drv_mitab.html
ogr2ogr -f "MID" foo.mid PG:"dbconnectionstring" -sql "select * from table"
I've been using ogr2ogr to do most of what I need with shapefiles (including dissolving them). However, I find that for big ones, it takes a REALLY long time.
Here's an example of what I'm doing:
ogr2ogr new.shp old.shp -dialect sqlite -sql "SELECT ST_Union(geometry) FROM old"
In certain instances, one might want to dissolve common neighboring shapes (which is what I think is going on here in the above command). However, in my case I simply want to flatten the entire file and every shape in it regardless of the values (I've already isolated the shapes I need).
Is there a faster way to do this when you don't need to care about the values and just want a shape that outlines the array of shapes in the file?
If you have isolated the shapes, and they don't have any shared boundaries, they can be easily collected into a single MULTIPOLYGON using ST_Collect. This should be really fast and simple to do:
ogr2ogr gcol.shp old.shp -dialect sqlite -sql "SELECT ST_Collect(geometry) FROM old"
If the geometries overlap and the boundaries need to be "dissolved", then ST_Union must be used. Faster spatial unions are done with a cascaded union technique, described here for PostGIS. It is supported by OGR, but it doesn't seem to be done elegantly.
Here is a two step SQL query. First make a MULTIPOLYGON of everything with ST_Collect (this is fast), then do a self-union which should trigger a UnionCascaded() call.
ogr2ogr new.shp old.shp -dialect sqlite -sql "SELECT ST_Union(gcol, gcol) FROM (SELECT ST_Collect(geometry) AS gcol FROM old) AS f"
Or to better view the actual SQL statement:
SELECT ST_Union(gcol, gcol)
FROM (
SELECT ST_Collect(geometry) AS gcol
FROM old
) AS f
I've had better success (i.e. faster) by converting it to raster then back to vector. For example:
# convert the vector file old.shp to a raster file new.tif using a pixel size of XRES/YRES
gdal_rasterize -tr XRES YRES -burn 255 -ot Byte -co COMPRESS=DEFLATE old.shp new.tif
# convert the raster file new.tif to a vector file new.shp, using the same raster as a -mask speeds up the processing
gdal_polygonize.py -f 'ESRI Shapefile' -mask new.tif new.tif new.shp
# removes the DN attribute created by gdal_polygonize.py
ogrinfo new.shp -sql "ALTER TABLE new DROP COLUMN DN"
What is the best way to load the following geojson file in Google Big Query?
http://storage.googleapis.com/velibs/stations/test.json
I have a lot of json files like this (much bigger) on Google Storage, and I cannot download/modify/upload them all (it would take forever). Note that the file is not newline-delimited, so I guess it needs to be modified online.
Thanks all.
Step by step 2019:
If you get the error "Error while reading data, error message: JSON parsing error in row starting at position 0: Nested arrays not allowed.", you might have a GeoJSON file.
Transform GeoJSON into new-line delimited JSON with jq, load as CSV into BigQuery:
jq -c .features[] \
san_francisco_censustracts.json > sf_censustracts_201905.json
bq load --source_format=CSV \
--quote='' --field_delimiter='|' \
fh-bigquery:deleting.sf_censustracts_201905 \
sf_censustracts_201905.json row
Parse the loaded file in BigQuery:
CREATE OR REPLACE TABLE `fh-bigquery.uber_201905.sf_censustracts`
AS
SELECT FORMAT('%f,%f', ST_Y(centroid), ST_X(centroid)) lat_lon, *
FROM (
SELECT *, ST_CENTROID(geometry) centroid
FROM (
SELECT
CAST(JSON_EXTRACT_SCALAR(row, '$.properties.MOVEMENT_ID') AS INT64) movement_id
, JSON_EXTRACT_SCALAR(row, '$.properties.DISPLAY_NAME') display_name
, ST_GeogFromGeoJson(JSON_EXTRACT(row, '$.geometry')) geometry
FROM `fh-bigquery.deleting.sf_censustracts_201905`
)
)
Alternative approaches:
With ogr2ogr:
https://medium.com/google-cloud/how-to-load-geographic-data-like-zipcode-boundaries-into-bigquery-25e4be4391c8
https://medium.com/#mentin/loading-large-spatial-features-to-bigquery-geography-2f6ceb6796df
With Node.js:
https://github.com/mentin/geoscripts/blob/master/geojson2bq/geojson2bqjson.js
The bucket in the question no longer exists.... However five years later there is a new answer.
In July 2018, Google announced an alpha (now beta) of BigQuery GIS.
The docs highlight a limitation that
BigQuery GIS supports only individual geometry objects in GeoJSON.
BigQuery GIS does not currently support GeoJSON feature objects,
feature collections, or the GeoJSON file format.
This means that any Feature of Feature Collection properties would need to be added to separate columns, with a geography column to hold the geojson geography.
In this tutorial by a Google trainer, polygons in a shape file are converted into geojson strings inside rows of a CSV file using gdal.
ogr2ogr -f csv -dialect sqlite -sql "select AsGeoJSON(geometry) AS geom, * from LAYER_NAME" output.csv inputfilename.shp
You want to end up with one column with the geometry content like this
{"type":"Polygon","coordinates":[[....]]}
Other columns may contain feature properties.
The CSV can then be imported to BQ. Then a query on the table can be viewed in BigQuery Geo Viz. You need to tell it which column contains the geometry.
Is it possible to load data from a json file (not just csv) using the Big Query command line tool? I am able to load a simple json file using the GUI, however, the command line is assuming a csv, and I don't see any documentation on how to specify json.
Here's the simple json file I'm using
{"col":"value"}
With schema
col:STRING
As of version 2.0.12, bq does allow uploading newline-delimited JSON files. This is an example command that does the job:
bq load --source_format NEWLINE_DELIMITED_JSON datasetName.tableName data.json schema.json
As mentioned above, "bq help load" will give you all of the details.
1) Yes you can
2) The documentation is here . Go to step 3: Upload the table in documentation.
3) You have to use --source_format flag to tell the bq that you are uploading a JSON file and not a csv.
4) The complete commmand structure is
bq load [--source_format=NEWLINE_DELIMITED_JSON] [--project_id=your_project_id] destination_data_set.destination_table data_source_uri table_schema
bq load --project_id=my_project_bq dataset_name.bq_table_name gs://bucket_name/json_file_name.json path_to_schema_in_your_machine
5) You can find other bq load variants by
bq help load
It does not support JSON formatted data loading.
Here is the documentation (bq help load) for the loadcommand with the latest bq version 2.0.9:
USAGE: bq [--global_flags] <command> [--command_flags] [args]
load Perform a load operation of source into destination_table.
Usage:
load <destination_table> <source> [<schema>]
The <destination_table> is the fully-qualified table name of table to create, or append to if the table already exists.
The <source> argument can be a path to a single local file, or a comma-separated list of URIs.
The <schema> argument should be either the name of a JSON file or a text schema. This schema should be omitted if the table already has one.
In the case that the schema is provided in text form, it should be a comma-separated list of entries of the form name[:type], where type will default
to string if not specified.
In the case that <schema> is a filename, it should contain a single array object, each entry of which should be an object with properties 'name',
'type', and (optionally) 'mode'. See the online documentation for more detail:
https://code.google.com/apis/bigquery/docs/uploading.html#createtable
Note: the case of a single-entry schema with no type specified is
ambiguous; one can use name:string to force interpretation as a
text schema.
Examples:
bq load ds.new_tbl ./info.csv ./info_schema.json
bq load ds.new_tbl gs://mybucket/info.csv ./info_schema.json
bq load ds.small gs://mybucket/small.csv name:integer,value:string
bq load ds.small gs://mybucket/small.csv field1,field2,field3
Arguments:
destination_table: Destination table name.
source: Name of local file to import, or a comma-separated list of
URI paths to data to import.
schema: Either a text schema or JSON file, as above.
Flags for load:
/usr/local/bin/bq:
--[no]allow_quoted_newlines: Whether to allow quoted newlines in CSV import data.
-E,--encoding: <UTF-8|ISO-8859-1>: The character encoding used by the input file. Options include:
ISO-8859-1 (also known as Latin-1)
UTF-8
-F,--field_delimiter: The character that indicates the boundary between columns in the input file. "\t" and "tab" are accepted names for tab.
--max_bad_records: Maximum number of bad records allowed before the entire job fails.
(default: '0')
(an integer)
--[no]replace: If true erase existing contents before loading new data.
(default: 'false')
--schema: Either a filename or a comma-separated list of fields in the form name[:type].
--skip_leading_rows: The number of rows at the beginning of the source file to skip.
(an integer)
gflags:
--flagfile: Insert flag definitions from the given file into the command line.
(default: '')
--undefok: comma-separated list of flag names that it is okay to specify on the command line even if the program does not define a flag with that name.
IMPORTANT: flags in this list that have arguments MUST use the --flag=value format.
(default: '')