Basser Seminar Series

Place-Based Information Systems: Textual Location

Speaker: Professor Hanan Samet

Time: Thursday 26 April 2012, 1:00-2:00pm
Location: The University of Sydney, School of IT Building, Lecture Theatre (Room 123), Level 1

Abstract

Abstract:
The popularity of web-based mapping services such as Google Earth/Maps and Microsoft Virtual Earth (Bing), has led to an increasing awareness of the importance of location data and its incorporation into both web-based search applications and the databases that support them, In the past, attention to location data had been primarily limited to geographic information systems (GIS), where locations correspond to spatial objects and are usually specified geometrically. However, in the web-based applications, the location data often corresponds to place names and is usually specified textually.

An advantage of such a specification is that the same specification can be used regardless of whether the place name is to be interpreted as a point or a region. Thus the place name acts as a polymorphic data type in the parlance of programming languages. However, its drawback is that it is ambiguous. In particular, a given specification may have several interpretations, not all of which are names of places. For example, ``Jordan'' may refer to both a person as well as a place. Moreover, there is additional ambiguity when the specification has a place name interpretation. For example, ``Jordan'' can refer to a river or a country while there are a number of cities named ``London''. In this talk we examine the extension of GIS concepts to textually specified location data and review search engines that we have developed to retrieve documents where the similarity criterion is not based solely on exact match of elements of the query string but instead also based on spatial proximity. Thus we want to take advantage of spatial synonyms so that, for example, a query seeking a rock concert in Beverly Hills would be satisfied by a result finding a rock concert in Hollywood or Santa Monica. This idea has been applied by us to develop the STEWARD (Spatio-Textual Extraction on the Web Aiding Retrieval of Documents) system for finding documents on website of the Department of Housing and Urban Development. This system relies on
the presence of a document tagger that automatically identifies spatial references in text, pdf, word, and other unstructured documents. The thesaurus for the document tagger is a collection of publicly available data sets forming a gazetteer containing the names of places in the world. Search results are ranked according to the extent to which they satisfy the query, which is determined in part by the prevalent spatial entities that are present in the document. The same ideas have also been adapted by us to collections of news articles as well as Twitter tweets resulting in the NewsStand and TwitterStand systems, respectively, which will be demonstrated along with the STEWARD system in conjunction with a discussion of some of the underlying issues that arose and the techniques used in their implementation. Future work involves applying these ideas to spreadsheet data.

Speaker's biography

Hanan Samet (http://www.cs.umd.edu/~hjs/) received the B.S. degree in engineering from UCLA, and the M.S. Degree in operations research and the M.S. and Ph.D. degrees in computer science from Stanford University.

In 1975 he joined the Computer Science Department at the University of Maryland, College Park, where he is a Professor and a member of the Computer Vision Laboratory. His research interests include data structures, computer graphics, geographic information systems, computer vision, robotics, database management systems, and programming languages, and is the author of over 300 publications on these topics. In 2008, he received a best paper award in the 2008 ACM SIGMOD and SIGSPATIAL Conferences. He is the author of the recent book titled "Foundations of Multidimensional and Metric Data Structures".

Prof Samet is a Fellow of the IEEE, ACM, AAAS, and IAPR (International Association for Pattern Recognition), and was also elected to the ACM Council in 1989-1991 where he served as the Capital Region Representative.

He is on the editorial boards of GeoInformatica, Journal of Visual Languages and Computing, and Image Understanding, and he is the founding chair of the ACM SIG on Spatial Information.