MySQL Spatial Simplify Geometries - mysql

I'm trying to create a web map which contains large polygon data-sets. In order to increase performance, I'm hoping to decrease polygon detail when zooming out.
Is MySQL able to simplify polygons and other geometries as part of a query?

EDIT: As James has pointed out, since 5.7, MySQL does support ST_Simplify.
I am afraid there is no simplify function in MySQL spatial. I filed that as a feature request five years ago, and you can see it has received no attention since then.
You have a number of options depending on whether you want to do this as a one off, or if you have dynamic data.
1). Write a quick and dirty function using the PointN function to access your points, only taking every 5th point say, creating a WKT string representing a simplified geometry, and then recreating a geometry using the GeomFromText function.
2). Dump your polygon as WKT, using AsText(geom) out to csv. Import into Postgres/Postgis using COPY command (equivalent of LOAD DATA INFILE), and use the ST_Simplify function there, then reverse the process to bring in back into MySQL.
3). Use ogr2ogr to dump to shp format, and then a tool like mapshaper to simplify it, output to shp and then import again using ogr2ogr. Mapshaper is nice, because you can get a feel for how the algorithm works, and could potentially use it to implement your own, instead of option 1.
There are other options, such as using Java Topology Suite if you are using Java server side, but I hope this gives you some idea of how to proceed.
I am sorry that the initial answer is, No. I went through this a number of years ago and eventually made a permanent switch to Postgres/Postgis, as it is much more fully featured for spatial work.

MySQL 5.7 contains ST_Simplify function for simplifying geometries.
From https://dev.mysql.com/doc/refman/5.7/en/spatial-convenience-functions.html:
ST_Simplify(g, max_distance)
Simplifies a geometry using the Douglas-Peucker algorithm and returns a simplified value of the same type, or NULL if any argument is NULL.
MySQL 5.7 includes a huge overhaul of spatial functionality compared to 5.6 and is now in General Availability status as of 5.7.9.

I had a use case where I wanted to use ST_Simplify, but the code had to run on MySQL 5.6 (which doesn't have it). Therefore, I developed a solution like the one suggested by John Powell in another answer.
MySQL does not unfortunately offer any aggregates, whereby you can create geometry by progressively adding points to it (i.e. there is no ST_AddPoint or similar). The only way you can compose a geometry is by building it step-by-step as a WKT string, then finally converting the completed string to a geometry.
Here is an example of a stored function that accepts a MultiLineString, and simplifies each LineString in it by only keeping every nth point, making sure that the start and end points are always kept. This is done by looping through the LineStrings in the MultiLineString, then through the points for each (skipping as required), and accumulating the lot in a WKT string, which is finally converted to a geometry using ST_GeomCollFromText.
-- geometryCollection: MultiLineString collection to simplify
-- skip: Number of points to remove between every two points
CREATE FUNCTION `sp_CustomSimplify`(gc geometrycollection, skip INT) RETURNS geometrycollection
BEGIN
DECLARE i, j,numLineStrings, numPoints INT;
DECLARE ls LineString;
DECLARE pt, lastPt Point;
DECLARE ls LineString;
DECLARE lastPt Point;
DECLARE txt VARCHAR(20000);
DECLARE digits INT;
SET digits = 4;
-- Start WKT string:
SET txt = 'MULTILINESTRING(';
-- Loop through the LineStrings in the geometry (which is a MultiLineString)
SET i = 1;
SET numLineStrings = ST_NumGeometries(gc);
loopLineStrings: LOOP
IF i > numLineStrings THEN LEAVE loopLineStrings; END IF;
SET ls = ST_GeometryN(gc, i);
-- Add first point to LineString:
SET pt = ST_StartPoint(ls);
SET txt = CONCAT(txt, '(', TRUNCATE(ST_X(pt),digits), ' ', TRUNCATE(ST_Y(pt),digits));
-- For each LineString, loop through points, skipping
-- points as we go, adding them to a running text string:
SET numPoints = ST_NumPoints(ls);
SET j = skip;
loopPoints: LOOP
IF j > numPoints THEN LEAVE loopPoints; END IF;
SET pt = ST_PointN(ls, j);
-- For each point, add it to a text string:
SET txt = CONCAT(txt, ',', TRUNCATE(ST_X(pt),digits), ' ', TRUNCATE(ST_Y(pt),digits));
SET j = j + skip;
END LOOP loopPoints;
-- Add last point to LineString:
SET lastPt = ST_EndPoint(ls);
SET txt = CONCAT(txt, ',', TRUNCATE(ST_X(lastPt),digits), ' ', TRUNCATE(ST_Y(lastPt),digits));
-- Close LineString WKT:
SET txt = CONCAT(txt, ')');
IF(i < numLineStrings) THEN
SET txt = CONCAT(txt, ',');
END IF;
SET i = i + 1;
END LOOP loopLineStrings;
-- Close MultiLineString WKT:
SET txt = CONCAT(txt, ')');
RETURN ST_GeomCollFromText(txt);
END
(This could be a lot prettier by extracting bits into separate functions.)

Related

Replace multiple characters using MySQL

i am using this piece of code to replace characters in one column of my database.
UPDATE items
SET items = REPLACE(items, 'ḇ','ḇ')
But now I have a list with almost 500 characters to replace.
Just writing the whole sequence of lines in one single query will not work.
UPDATE items
SET items = REPLACE(items, 'ḇ','ḇ')
SET items = REPLACE(items, '&#x1E0x;','x')
SET items = REPLACE(items, '&#x1E0y;','y')
ETC.
Or I do not know how to write it.
Can anyone help me please?
Create a table that has the search string and the replace string as columns. Add all 500 rows of what needs to be replaced. Then write a stored procedure that will lookup the replace value from the lookup table and replace with the value. The lookup table can easily be loaded into MySql from an Excel or csv file.
Here's the pseudo code to show the looping and lookup. I know it won't compile, I'm a bit rusty on MySql syntax. I usually work in Oracle so the pseudocode syntax is more Oracle-esque.
DECLARE
v_old_string varchar;
v_new_string varchar;
BEGIN
FOR v IN (SELECT * FROM items) LOOP
SELECT old_string, new_string
INTO v_old_string, v_new_string
FROM my_lookup_table
WHERE old_string = v.thestringcolumn;
UPDATE items
SET itemcolumn = REPLACE(itemcolumn, v_old_string, v_new_string)
WHERE itemcolumn = v_old_string;
END LOOP;
END;

Delphi Slowdown on importing multiple CSV files to a dataset

I am using Delphi 7, Windows 7 and Absolute Database.
Quick Background. I work for a charity shop that relies on items being donated to us. To reclaim 'Gift Aid' from our HMRC we have to submit details of all sales together with the name and address of each donator of that sale. We help people with Special Needs so accurate data input is important.
Up to now to check Post Code verification was easy (based on our local area) basically the format of AA00_0AA or AA0_0AA. As our name has become better known not all Post Codes follow these rules.
I have access to UK's Royal Mail database for addresses in the UK si I thought to actually compare the inputted Post Code with a real Post Code. The RM csv file is huge so I use GSplit3 to break it down into more manageable files. This leaves me with 492 csv files each consisting of about 62000 lines. Note that I am only interested in the Post Codes so there is massive duplication.
To load these files into a dataset (without duplication) I first loaded the file names into a listbox and ran a loop to iterate through all the files to copy to the dataser.To avoid duplication I tried putting an unique index on the field but even running outside of Delphi I still got an error message about duplication. I then tried capturing the text of the last record to be appended and then compare it with the next record
procedure TForm1.importClick(Sender: TObject);
var
i,y:Integer;
lstfile:string;
begin
for i:= 0 to ListBox1.Items.Count-1 do
begin
lstfile:='';
cd.Active:=False;//cd is a csv dataset loaded with csv file
cd.FileName:='C:\paf 112018\CSV PAF\'+ListBox1.Items[i];
cd.Active:=True;
while not cd.Eof do
begin
if (lstfile:=cd.Fields[0].AsString=cd.Fields[0].AsString) then cd.Next
else
table1.append;
table1.fields[0].asstring:=cd.Fields[0].AsString;
lstfile:=cd.Fields[0].AsString;
cd.Next;
end;
end;
table1.Edit;
table1.Post;
end;
This seemed to work OK although the total number of records in the dataset seemed low. I checked with my own Post Code and it wasn't there although another post Code was located. So obviously records had been skipped. I then tried loading the CSV file into a string list with dupignore then copying the stringlist to the dataset.
unit Unit1;
interface
uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
Dialogs, Grids, DBGrids, SMDBGrid, StdCtrls, DB, ABSMain, SdfData;
type
TForm1 = class(TForm)
cd: TSdfDataSet;
dscd: TDataSource;
dst: TDataSource;
ABSDatabase1: TABSDatabase;
table1: TABSTable;
table1PostCode: TStringField;
Label2: TLabel;
ListBox1: TListBox;
getfiles: TButton;
import: TButton;
procedure getfilesClick(Sender: TObject);
procedure importClick(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
var
Form1: TForm1;
num:Integer;
implementation
{$R *.dfm}
procedure ListFileDir(Path: string; FileList: TStrings);
var
SR: TSearchRec;
begin
if FindFirst(Path + '*.csv', faAnyFile, SR) = 0 then
begin
repeat
if (SR.Attr <> faDirectory) then
begin
FileList.Add(SR.Name);
end;
until FindNext(SR) <> 0;
FindClose(SR);
end;
end;
procedure TForm1.getfilesClick(Sender: TObject);
begin//Fill listbox with csv files
ListFileDir('C:\paf 112018\CSV PAF\', ListBox1.Items);
end;
//start to iterate through files to appane to dataset
procedure TForm1.importClick(Sender: TObject);
var
i,y:Integer;
myl:TStringList;
begin
for i:= 0 to ListBox1.Items.Count-1 do
begin
myl:=TStringList.Create;
myl.Sorted:=True;
myl.Duplicates:=dupIgnore;
cd.Active:=False;
cd.FileName:='C:\paf 112018\CSV PAF\'+ListBox1.Items[i];
cd.Active:=True;
while not cd.Eof do
begin
if (cd.Fields[0].AsString='')then cd.Next
else
myl.Add(cd.Fields[0].AsString);
cd.Next;
end;
for y:= 0 to myl.Count-1 do
begin
table1.Append;
table1.Fields[0].AsString:=myl.Strings[y];
end;
myl.Destroy;
end;
t.Edit;
t.Post;
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
t.Locate('Post Code',edit1.text,[]);
end;
procedure TForm1.Button2Click(Sender: TObject);
var
sel:string;
begin
q.Close;
q.SQL.Clear;
q.SQL.Add('Select * from postc where [Post Code] like :sel');
q.ParamByName('sel').AsString:=Edit1.Text;
q.Active:=True;
end;
end.
This starts well but soon starts to slow I presume because of memory leaks, I have tried myl.free(), freeandnil(myl) and finally destroy but they all slow down quickly. I am not an expert but do enjoy using Delphi and normally manage to solve problems through your pages or googling but this time I am stumped. Can anyone suggest a better method please
The following shows how to add postcodes from a file containing a list of them
to a query-type DataSet such as TAdoQuery or your q dataset (your q doesn't say what type
it is, as far as I can see).
It also seems from what you say that although you treat your postcode file
as a CVS file, you shouldn't actually need to: if the records contain only one
field, there should be no need for commas, because there is nothing to separate,
the file should contain simply one postcode per line. Consequently, there
doesn't seem to be any need to incur the overhead of loading it as a CSV file, so you should be able simply to load it into a TStringList and add the postcodes from there.
I'm not going to attempt to correct your code, just show a very simple example of how I think it should be done.
So the following code opens a postcode list file, which is assumed to contain one postcode per line, checks whether each entry in it
already exists in your table of postcodes and adds it if not.
procedure TForm1.AddPostCodes(const PostCodeFileName: String);
// The following shows how to add postcodes to a table of existing ones
// from a file named PostCodeFileName which should include an explicit path
var
PostCodeList : TStringList;
PostCode : String;
i : Integer;
begin
PostCodeList := TStringList.Create;
try
PostCodeList.LoadFromFile(PostCodeFileName);
if qPostCodes.Active then
qPostCodes.Close;
qPostCodes.Sql.Text := 'select * from postcodes order by postcode';
qPostCodes.Open;
try
qPostCodes.DisableControls; // Always call DisableControls + EnableControls
// when iterating a dataset which has db-aware controls connected to it
for i := 0 to PostCodeList.Count - 1 do begin
PostCode := PostCodeList[i]; // use of PostCode local variable is to assist debuggging
if not qPostCodes.Locate('PostCode', PostCode, [loCaseInsensitive]) then
qPostCodes.InsertRecord([PostCode]); // InsertRecord does not need to be foollowed by a call to Post.
end;
finally
qPostCodes.EnableControls;
qPostCodes.Close;
end;
finally
PostCodeList.Free; // Don't use .Destroy!
end;
end;
Btw, regarding the comment about bracketing an iteration of a dataset
by calls to DisableControls and EnableControls, the usual reason for doing this is to avoid the overheaad of updating the gui's display of any db-aware controls connected to the dataset. However, one of the reasons I'm
not willing to speculate what is causing your slowdown is that TAdoQuery,
which is one of the standard dataset types that codes with Delphi benefits massively
from DisableControls/EnableControls even when there are NO db-aware controls
connected to it. This is because of a quirk in the coding of TAdoQuery. Whatever
dataset you are using may have a similar quirk.

Use CHARINDEX rather than IN clause to speed up SSRS report filters

In some cases, we are using IN clause in our filters of SSRS reports. A lots of them are causing issues with the performance by using hundreds of items inside of IN clauses.
such as:
WHERE TableA.School IN (#School)
sometimes, the multi-value parameters are really tricky to handle, you might need to do =Join(Mypara.Value,",") in the RDL and write a SQL function to convert them into a set of SQL data to be able to feed the SQL SP. (especially some older version of SSRS).
FYI: function to use to break a comma deliminator string into record set:
CREATE function [dbo].[fnSpark_BreakUpList] (
#List VARCHAR(MAX)
)
RETURNS #csvlist TABLE (Item VARCHAR(MAX))
AS
BEGIN
DECLARE #Item VARCHAR(MAX)
-- Loop through each item in the comma delimited list
WHILE (LEN(#List) > 0)
BEGIN
IF CHARINDEX(',',#list) > 0
BEGIN
SET #Item = SUBSTRING(#List,1,(CHARINDEX(',', #List)-1))
SET #List = SUBSTRING(#List,(CHARINDEX(',', #List) + DATALENGTH(',')),DATALENGTH(#List))
END
ELSE
BEGIN
SET #Item = #List
SET #List = NULL
END
-- Insert each item into the csvlist table
INSERT into #csvlist (Item) VALUES (#Item)
END
RETURN
END
GO
I will post answers shortly to show how to increase the performance by using CHARINDEX. (So you dont have to anything like above....)
If you are not actually want to retrieve the item from the LONG delimited string, but to only filtering it, CHARINDEX is a better way to go for.
Instead of using IN Clause, you could just use:
WHERE CHARINDEX(','+TableA.School+',',','​+#School+',') > 0
NOTE:
1. I pad an extra comma at the end of target string 'TableA.School' to avoid the satuation that if a big string contains a sub string same as the filtering item.
(Such as we have a school called 'AB' and another 'ABC' that we dont want the 'ABC' to be picked up when we are targeting for 'AB'.... )
I pad an extra comma at the end of resources string​ '#School' to ensure that the single item / the last item (they will end without a comma) to be picked up when we are targeting them.
I pad an extra comma at the begining of target string 'TableA.School' to avoid the satuation that if a big string contains a sub string same as the filtering item.
(Such as we have a school called 'AB' and another 'CAB' that we dont want the 'CAB' to be picked up when we are targeting for 'AB'.... )
Example:
I am using:
WHERE
CHARINDEX(','+CAST(DENTIST4.wStudentYear AS VARCHAR(10))+',',','+#StudentYear+',') > 0
TO replace:
WHERE
DENTIST4.wStudentYear IN (#StudentYear)
for one of the report i was doing, which makes 4000+ pages rendering improved from about 10 mins into under a min for a large database (11 G).
IMPORTANT NOTES:
Please make sure the filter report parameters you are passing into the dataset uses JOIN clause.
=Join(Parameters!MyParameter.Value,",")​
Hope this helps....
IMPORTANT: This approach ONLY improves performance if the filter has large set of items, for any filters with small set of items IN clause will do better jobs.

Filter Dataset using SQL Query

I am using Zeos and mysql in my delphi project.
what I would like to do is filter dataset using a textbox.
to do that, I am using following query in textbox 'OnChange' Event:
ZGrips.Active := false;
ZGrips.SQL.Clear;
ZGrips.SQL.Add('SELECT Part_Name, Description, OrderGerman, OrderEnglish FROM Part');
ZGrips.SQL.Add('WHERE Part_Name LIKE ' + '"%' + trim(txt_search.Text) + '%"');
ZGrips.Active := true;
after I run and type first character in textbox, I get empty dataset in my DBGrid,
so DBGrid is showing nothing, then If I type second character I get some result in DBGrid. and even more strange behavior: if I will use AS Clause in my SQL Query like:
Part_Name AS blablabla,
Description AS blablabla,
OrderGerman AS OG,
OrderEnglish AS OE
in that case DBGrid is showing only 2 columns: Part_Name and Description, I dont understand why it is ignoring 3rd and 4th columns.
thanks for any help in advance.
Always use parameters
Firstly you need to use parameters, otherwise your query will break or worse when the user enters the wrong characters in the search box.
See: How does the SQL injection from the "Bobby Tables" XKCD comic work?
Parameters also makes you query faster, because the database engine only have to decode the query once.
If you change a parameter the engine will know that the query itself has not changed and will not re-decode it.
Don't use clear and add
Just supply the SQL as text in one go, it's faster.
This is esp. true in a loop, outside the loop you will not notice the difference.
Your code should read something like:
procedure TForm1.SetupSearch; //run this only once.
var
SQL: string;
begin
ZGrips.Active:= false;
SQL:= 'SELECT Part_Name, Description, OrderGerman, OrderEnglish FROM Part' +
'WHERE Part_Name LIKE :searchtext'); //note no % here.
ZGrips.SQL.Text:= SQL; //don't use clear and don't use SQL.Add.
end;
//See: http://docwiki.embarcadero.com/Libraries/XE2/en/Vcl.StdCtrls.TEdit.OnChange
procedure TForm1.Edit1Change(Sender: TObject);
begin
if Edit1.Modified then begin
Timer1.Active:= true;
end;
end;
procedure TForm1.Timer1Timer(Sender: TObject);
begin
Timer1.Active:= false;
if Edit1.Text <> ZGrips.Params[0].AsString then begin
ZGrips.Params[0].AsString:= Edit1.Text + '%'
ZGrips.Active:= true;
end;
end;
Use a timer
As per #MartinA's suggestion, use a timer and start the query only ever so often.
The wierd behaviour you're getting maybe because you're stopping and reactivating a new query before the old one has had time to finish.
The Params[index: integer] property is a bit faster than the ParamsByName property.
Although this does not really matter outside a loop.
Allow the database to use an index!
Using only a trailing wildcard % is faster than using a leading wildcard because the database can only use an index is there is a trailing wildcard.
If you want to use a leading wildcard, then consider storing the data in reverse order and use a trailing wildcard instead.
Full-text indexes are much better than like
Of course if you use both a leading and a trailing wild card then you have to use a full-text index.
In MySQL you'll than use the MATCH AGAINST syntax,
see: Differences between INDEX, PRIMARY, UNIQUE, FULLTEXT in MySQL?
and: Which SQL query is better, MATCH AGAINST or LIKE?
The lastest versions of MySQL support full-text indexes in InnoDB.
Remember to never use MyISAM, it's unreliable.

Using SQL to determine word count stats of a text field

I've recently been working on some database search functionality and wanted to get some information like the average words per document (e.g. text field in the database). The only thing I have found so far (without processing in language of choice outside the DB) is:
SELECT AVG(LENGTH(content) - LENGTH(REPLACE(content, ' ', '')) + 1)
FROM documents
This seems to work* but do you have other suggestions? I'm currently using MySQL 4 (hope to move to version 5 for this app soon), but am also interested in general solutions.
Thanks!
* I can imagine that this is a pretty rough way to determine this as it does not account for HTML in the content and the like as well. That's OK for this particular project but again are there better ways?
Update: To define what I mean by "better": either more accurate, performs more efficiently, or is more "correct" (easy to maintain, good practice, etc). For the content I have available, the query above is fast enough and is accurate for this project, but I may need something similar in the future (so I asked).
The text handling capabilities of MySQL aren't good enough for what you want. A stored function is an option, but will probably be slow. Your best bet to process the data within MySQL is to add a user defined function. If you're going to build a newer version of MySQL anyway, you could also add a native function.
The "correct" way is to process the data outside the DB since DBs are for storage, not processing, and any heavy processing might put too much of a load on the DBMS. Additionally, calculating the word count outside of MySQL makes it easier to change the definition of what counts as a word. How about storing the word count in the DB and updating it when a document is changed?
Example stored function:
DELIMITER $$
CREATE FUNCTION wordcount(str LONGTEXT)
RETURNS INT
DETERMINISTIC
SQL SECURITY INVOKER
NO SQL
BEGIN
DECLARE wordCnt, idx, maxIdx INT DEFAULT 0;
DECLARE currChar, prevChar BOOL DEFAULT 0;
SET maxIdx=char_length(str);
SET idx = 1;
WHILE idx <= maxIdx DO
SET currChar=SUBSTRING(str, idx, 1) RLIKE '[[:alnum:]]';
IF NOT prevChar AND currChar THEN
SET wordCnt=wordCnt+1;
END IF;
SET prevChar=currChar;
SET idx=idx+1;
END WHILE;
RETURN wordCnt;
END
$$
DELIMITER ;
This is quite a bit faster, though just slightly less accurate. I found it 4% light on the count, which is OK for "estimate" scenarios.
SELECT
ROUND (
(
CHAR_LENGTH(content) - CHAR_LENGTH(REPLACE (content, " ", ""))
)
/ CHAR_LENGTH(" ")
) AS count
FROM documents
Simple solution for some similar cases (MySQL):
SELECT *,
(CHAR_LENGTH(student)-CHAR_LENGTH(REPLACE(student,' ','')))+1 as 'count'
FROM documents;
You can use the word_count() UDF from https://github.com/spachev/mysql_udf_bundle. I ported the logic from the accepted answer with a difference that my code only supports latin1 charset. The logic would need to be reworked to support other charsets. Also, both implementations always consider a non-alphanumeric character to be a delimiter, which may not always desirable - for example "teacher's book" is considered to be three words by both implementations.
The UDF version is, of course, significantly faster. For a quick test I tried both on a dataset from Project Guttenberg consisting of 9751 records totaling about 3 GB. The UDF did all of them in 18 seconds, while the stored function took 63 seconds to process just 30 records (which UDF does in 0.05 seconds). So the UDF is roughly 1000 times faster in this case.
UDF will beat any other method in speed that does not involve modifying MySQL source code. This is because it has access to the string bytes in memory and can operate directly on bytes without them having to be moved around. It is also compiled into machine code and runs directly on the CPU.
Well I tried to use the function defined above and it was great, except one scenario.
In English you have strong use of ' as part of the word. The function above, at least to me, counted "haven't" as 2.
So here is my little correction:
DELIMITER $$
CREATE FUNCTION wordcount(str TEXT)
RETURNS INT
DETERMINISTIC
SQL SECURITY INVOKER
NO SQL
BEGIN
DECLARE wordCnt, idx, maxIdx INT DEFAULT 0;
DECLARE currChar, prevChar BOOL DEFAULT 0;
SET maxIdx=CHAR_LENGTH(str);
WHILE idx < maxIdx DO
SET currChar=SUBSTRING(str, idx, 1) RLIKE '[[:alnum:]]' OR SUBSTRING(str, idx, 1) RLIKE "'";
IF NOT prevChar AND currChar THEN
SET wordCnt=wordCnt+1;
END IF;
SET prevChar=currChar;
SET idx=idx+1;
END WHILE;
RETURN wordCnt;
END
$$