This project is read-only.

Performance on distance queries (Geocoding)

Topics: User Forum
Jan 25, 2007 at 8:29 AM

Hi there.
I have been using PostGIS since now for my distance queries and I was very glad knowing that spatial extensions for SQL 2005 exists too.
I have tried the same (i think) distance queries in both MsSqlSpatial and PostGIS and noticed a very big difference in performance. In 100 queries I have noticed for MsSqlSpatial an average query time of 247 msec and for PostGIS 15 msec.
Is there anything I can do to improve performance?
Here's the PostGIS query I use:
SELECT R.name, R.localizedname, R.numberingleftstart, R.numberingleftstop, R.numberingrightstart, R.numberingrightstop, R.area, R.zip,
distance(shape, GeomFromText('POINT( 23.374042 42.62357 )', 4326)) as distance, C.localizedname as countryname
FROM geodb.countryroads R
INNER JOIN geodb.countries C ON R.countryid = C.entityid
WHERE shape && Expand(GeomFromText('POINT( 23.374042 42.62357 )', 4326), 0.003)
AND distance(shape, GeomFromText('POINT( 23.374042 42.62357 )', 4326)) < 0.003
ORDER BY distance(shape, GeomFromText('POINT( 23.374042 42.62357 )', 4326)) LIMIT 1

Here's the MsSqpSpatial query:
SELECT TOP 1 R.Name, R.LocalizedName, R.NumberingLeftStart, R.NumberingLeftStop, R.NumberingRightStart, R.NumberingRightStop, R.Area, R.Zip
,ST.Distance(R.Shape, ST.GeomFromText('POINT( 22.956617 40.616852 )', -1)) as Distance
, C.localizedname as countryname
FROM ST.IsWithinDistanceQuery('CountryRoads', 'Shape', ST.GeomFromText('POINT(22.956617 40.616852 )', 4326), 0.0003) AS Q
INNER JOIN CountryRoads AS R ON Q.oid = R.oid
INNER JOIN Countries C on C.EntityID = R.CountryID
ORDER BY Distance

Thank you all in advance
Jan 25, 2007 at 9:02 AM
I have also tried the following queries that use FilterQuery and RelateQuery but still could drop the time lower than 200 msec.

SELECT top 1 R.Name, R.LocalizedName, R.NumberingLeftStart, R.NumberingLeftStop, R.NumberingRightStart, R.NumberingRightStop, R.Area, R.Zip
,ST.Distance(R.Shape, ST.GeomFromText('POINT( 22.956617 40.616852 )', -1)) as Distance
,C.localizedname as CountryName
FROM ST.FilterQuery('CountryRoads', 'Shape', ST.Expand(ST.GeomFromText('POINT( 22.956617 40.616852 )', 4326), 0.0003)) AS q
INNER JOIN CountryRoads AS R ON q.oid = R.oid
INNER JOIN Countries C on R.CountryID = C.EntityID
WHERE ST.Distance(Shape, ST.GeomFromText('POINT( 22.956617 40.616852 )', -1)) < 0.0003
ORDER BY Distance



SELECT top 1 R.Name, R.LocalizedName, R.NumberingLeftStart, R.NumberingLeftStop, R.NumberingRightStart, R.NumberingRightStop, R.Area, R.Zip
,ST.Distance(R.Shape, ST.GeomFromText('POINT( 22.956617 40.616852 )', -1)) as Distance
,C.localizedname as CountryName
FROM ST.RelateQuery('CountryRoads', 'Shape', ST.Expand(ST.GeomFromText('POINT( 22.956617 40.616852 )', 4326), 0.0003), 'Intersects') AS q
INNER JOIN CountryRoads AS R ON q.oid = R.oid
INNER JOIN Countries C on R.CountryID = C.EntityID
WHERE ST.Distance(Shape, ST.GeomFromText('POINT( 22.956617 40.616852 )', -1)) < 0.0003
ORDER BY Distance
Jan 25, 2007 at 9:03 AM
My mistake,
Could NOT drop the time to less than 200 msec using the above query modifications.
Jan 25, 2007 at 4:05 PM
I suggest you stick to the query that uses IsWithinDistanceQuery. It has a little better performance than using FilterQuery + Distance. Now, I would like to benchmark both query versions (PostGIS vs. MsSqlSpatial) using your data and see whether there's room for optimization. Please, find a way to send me a zipped shapefile (as attachment to rstuven@gmail.com, use a service such as yousendit.com or publish it somewhere in the net). If there are license issues, all I need is the shapefile with no feature fields, just geometries.

Best regards,
Ricardo Stuven.
Jan 25, 2007 at 9:20 PM
Please, download the latest changset, build it as RELEASE, deploy it and tell me if you see any speed improvement running this query (note this is slightly different from yours):

SELECT TOP 1 R.Name, R.LocalizedName, R.NumberingLeftStart, R.NumberingLeftStop, R.NumberingRightStart, R.NumberingRightStop, R.Area, R.Zip, Q.distance, C.localizedname as countryname
FROM ST.IsWithinDistanceQuery('CountryRoads', 'Shape', ST.GeomFromText('POINT(22.956617 40.616852 )', 4326), 0.0003) AS Q
INNER JOIN CountryRoads AS R ON Q.oid = R.oid
INNER JOIN Countries C on C.EntityID = R.CountryID
ORDER BY Q.distance

If not, I still we'll need a sample of your data to figure out what could I optimize...

Best regards,
Ricardo Stuven.
Jan 26, 2007 at 8:38 AM
I deployed the new DLL you gave me into a new database. Running the query I got the following exception:

Msg 6522, Level 16, State 1, Line 2
A .NET Framework error occurred during execution of user defined routine or aggregate 'IsWithinDistanceQuery':
System.Data.SqlClient.SqlException: Invalid use of side-effecting or time-dependent operator in 'ALTER TABLE' within a function.
System.Data.SqlClient.SqlException:
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.SqlInternalConnectionSmi.EventSink.ProcessMessagesAndThrow(Boolean ignoreNonFatalMessages)
at Microsoft.SqlServer.Server.SmiEventSink_Default.ProcessMessagesAndThrow(Boolean ignoreNonFatalMessages)
at System.Data.SqlClient.SqlCommand.RunExecuteNonQuerySmi(Boolean sendToPipe)
at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(DbAsyncResult result, String methodName, Boolean sendToPipe)
at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
at MsSqlSpatial.GeometryTable.GetPrimaryKeyColumn(SqlCommand command, String schemaName, String tableName)
at TableValuedFunctions.SpatialQuery.Execute(SqlString tablename, SqlString columnname, SqlBytes ewkbMBR, SqlBytes ewkb, SqlString condition, Arguments args, ReaderHandler handler)
at TableValuedFunctions.SpatialQuery.Execute(SqlString tablename, SqlString columnname, SqlBytes ewkbMBR, SqlBytes ewkb, Arguments args, ReaderHandler handler)
at TableValuedFunctions.IsWithinDistanceQuery(SqlString tablename, SqlString columnname, SqlBytes ewkb, SqlDouble distance)
.
Jan 26, 2007 at 8:45 AM

As for the data, I will send you a backup of my Postgres database. I will email you a link of where you can download it from.

Isn't there a way to extend SQL Server 2005 with a new type of index? I guess not, otherwise you would propably have done it...
Jan 26, 2007 at 1:22 PM
> I deployed the new DLL you gave me into a new database.
> Running the query I got the following exception:
> Msg 6522, Level 16, State 1, Line 2
> A .NET Framework error occurred during execution of user defined
> routine or aggregate 'IsWithinDistanceQuery':
> System.Data.SqlClient.SqlException: Invalid use of side-effecting > or time-dependent operator in 'ALTER TABLE' within a function.

That exception happened because 'CountryRoads' have no primary key defined, which is weird because the import process (I assume in this case you imported the data) creates the table with "oid INTEGER IDENTITY PRIMARY KEY". Make sure 'CountryRoads' has defined "oid" as primary key.

> As for the data, I will send you a backup of my Postgres database.
> I will email you a link of where you can download it from.

Thank you.

> Isn't there a way to extend SQL Server 2005 with a new type
> of index? I guess not, otherwise you would propably have done it...

As far as I know, no. However, you can "mount" a different method over the native indexing method. I choose, by the moment, the simpler one (indexing on the envelope corners) but other methods could be used (eg. see http://skyserver.org/htm/)

Best regards,
Ricardo Stuven.
Jan 29, 2007 at 11:35 AM

You were right, there was no primary key on oid. I created one manually it worked.
But still the average query time is 203 msec. This is better than the other queries (270 msec), but still much logner than PostGIS (15 msec).

Hope the shape file will help you.
Jan 29, 2007 at 8:48 PM
The link you sent to my email is broken. Please, could you send it to me again?

Best regards,
Ricardo Stuven.
Jan 29, 2007 at 9:19 PM
Forget the preceding message. I see you sent me a splitted RAR file also. Thanks.

Best regards,
Ricardo Stuven.
Jan 29, 2007 at 9:52 PM
The table without primary key was very strange... Could you please run the following queries and post back the results?

SELECT name FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[CountryRoads]') AND name LIKE N'IDX_the_geom_%';
 
SELECT name, definition FROM sys.check_constraints WHERE parent_object_id = OBJECT_ID(N'[dbo].[CountryRoads]');
 
SELECT name FROM sys.triggers WHERE parent_id = OBJECT_ID(N'[dbo].[CountryRoads]');
 
SELECT * FROM sys.stats WHERE object_id = OBJECT_ID(N'[dbo].[CountryRoads]');
 
DBCC SHOW_STATISTICS (CountryRoads, IDX_the_geom_Envelope_MaxX);

Best regards,
Ricardo Stuven.
Jan 30, 2007 at 9:24 AM

The 1st, 2nd and 3d queries returned nothing.

The 4th query returned

1669580986 PKCountryRoadsOID 1 0 0 0
1669580986 WASys000000016383C8BA 2 1 0 0
1669580986 WASys0000000C6383C8BA 3 1 0 0
1669580986 WASys000000036383C8BA 4 1 0 0
1669580986 WASys000000046383C8BA 5 1 0 0
1669580986 WASys000000056383C8BA 6 1 0 0
1669580986 WASys000000066383C8BA 7 1 0 0

and DBCC returned an error:

Msg 2767, Level 16, State 1, Line 1
Could not locate statistics 'IDXthegeomEnvelopeMaxX' in the system catalogs.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.


Jan 30, 2007 at 6:05 PM
That explains everything!

Either you created the geometry column without using the provided methods (importing with msscmd.exe, importing with ST.ImportFromShapefile or calling ST.AddGeometryColumn), or you made an incomplete backup & restore procedure... or another "accidents"

I simply suggest you drop the table and import the data again. This should properly create the supporting bits are missing in this moment and your query should perform better (well, with no indexes, 200ms is not so bad... :)

In the meantime, I will think in better ways of check & repair MsSqlSpatial integrity.

Best regards,
Ricardo Stuven.
Jan 31, 2007 at 4:22 AM
> drop the table and import the data again.

Just to be sure, additionally download/compile/deploy the latest changes.

I've tried with the data you sent me. Don't consider the following tests as proper benchmarks, but just a reference to run your own tests.

In "SQL Server Management Studio" I enabled "Include Client Statistics" and read the value "Client Statistics -> Wait time on server replies -> Average" after 10 tries. In pgAdminIII, the execution time has the format "16+20ms", where the second value (eg. "20") is the query processing time. I calculated the average after 10 tries. Running the queries in my laptop I'm getting an average processing query time of 20ms using MsSqlSpatial. I'm getting the same results using PostGIS.

The queries:

  • PostGIS
SELECT R.EDGE_ID, distance(the_geom, GeomFromText('POINT( 22.956617 40.616852 )', 4326)) as distance
FROM gr21_roads_wgs84 R 
WHERE the_geom && Expand(GeomFromText('POINT( 22.956617 40.616852 )', 4326), 0.003) 
AND distance(the_geom, GeomFromText('POINT( 22.956617 40.616852 )', 4326)) < 0.003 
ORDER BY distance
LIMIT 1

  • MsSqlSpatial
SELECT TOP 1 R.EDGE_ID, Q.distance
FROM ST.IsWithinDistanceQuery('Gr21_roads_wgs84', 'the_geom', ST.GeomFromText('POINT(22.956617 40.616852 )', 4326), 0.0003) AS Q
INNER JOIN Gr21_roads_wgs84 AS R ON Q.oid = R.oid
ORDER BY Q.distance

Both queries return the same result:
EDGE_ID              distance
-------------------- ----------------------
544971               8.25321453809645E-05


Best regards,
Ricardo Stuven.
Feb 1, 2007 at 9:08 AM
Ok, I will create a new DB, deploy and import the data again.
The newest update from the above link gave me this run-time error while importing the data from Postgre.
I did not have the same problem with the previous version (0.1.2582.17939) so, I'll go with that one.
Thought you'd like to know.


d:\program files\mssqlspatial\17435>msscmd -import=pgsql -server=pelasgian -db=GeoDB1
-table=CountryRoads -column=shape -srid=4326 -pgsqlserver=pelasgian -pgsqldb=GeoDB
-pgsqllogin=postgres -pgsqlpassword=secret -pgsqlschema=geodb -pgsqltable=countryroads
-pgsqlgeomcolumn=shape -pgsqloidcolumn=entityid


Unhandled Exception: System.IndexOutOfRangeException: An SqlParameter with Param
eterName '@shapeEnvelopeMinX' is not contained by this SqlParameterCollection.

at System.Data.SqlClient.SqlParameterCollection.GetParameter(String parameter
Name)
at System.Data.SqlClient.SqlParameterCollection.get_Item(String parameterName
)
at MsSqlSpatialConsole.ImportPgSql.Process(SqlCommand command, String columnN
ame, DataColumnCollection columns)
at StoredProcedures.ImportFromDatasource(IProvider datasource, String schemaN
ame, String tableName, String columnName, Int32 srid, String type, String connec
tionString, ImportFromDatasourceProcess process, Int32& importedRows, TimeSpan&
elapsedTime)
at MsSqlSpatialConsole.ImportPgSql..ctor(ArgParser arg, String schema, String
table, String column, Int32 srid, String type, String conn)
at MsSqlSpatialConsole.Program.Main(String[] args)
Feb 1, 2007 at 1:05 PM
This discussion has been copied to Work Item 7885. You may wish to continue further discussion there.
Feb 2, 2007 at 8:58 AM
Hi there again Ricardo.

I did the whole thing from the start. Created a new DB, imported the data, used the latest changes.
The queries for the indecies and the statistics that you gave before did not return any results because they were
looking for indecies in the column named "thegeom". Mine is "shape". So, by changing from IDXthe_geom
to IDXshape all queries return results, DBCC SHOWSTATISTICS also.
So tell me if you need anything from this information.

In my case the results were still the same. MsSqlSpatial did an average of 200msec while PostGIS did 15msec or less.
This is the code I use for calculating query duration for both SQL and Postgree:

// All variable are declared elsewhere
start = DateTime.Now;
result = command.ExecuteReader();
finish = DateTime.Now;
span = finish - start;
rowAverage += span.TotalMilliseconds;

and finally outside the loop:
average = rowAverage / totalQueries;

I guess it would be better for Microsoft to give us a way of expanding SQL Server with new types of indecies.
R-Trees or Grids would make great difference.