Web Data Management
Aim
To conduct internationally recognised research in Web and Data Engineering, in particular to investigate effective methods and efficient algorithms for managing huge, diverse and distributed XML data in advanced Web information systems and applications.
Background
As the capabilities of computer processing rapidly evolve, so too does the range and complexity of the data that can be modelled. And with the growth of Internet usage in commercial processing, the need to be able to better organise the Web has led to the emergence of XML as a new standard for data representation and exchange on the Internet.
XML, in combination with other standards, makes it possible to define the content of a document separately from its formatting, making it easy to reuse that content in other applications or for other presentation environments. Most importantly, XML provides a basic syntax that can be used to share information between different kinds of computers, different applications, and different organizations without needing to pass through many layers of conversion
In addition to the promise of greatly facilitating information integration on the Internet, an exciting promise of XML is that it "turns the Web into a database." Migrating Web information to XML is a significant first step in enabling efficient execution of ad-hoc, expressive queries over large amounts of Web data -- a core feature of traditional database management systems. Consider the current state of query processing over information on the Web:
- Data embedded within HTML pages needs to be pre-processed by special-purpose, page-specific parsers before meaningful queries can be posed, a limited technology at best. Otherwise, ad-hoc queries are limited to simple keyword-based searches (as provided by search engines, for example) that understand documents as streams of words and little more.
- Data stored within traditional database management systems generally is accessed on the Web only through simple and rigid forms-based interfaces.
Encoding information in XML is a first step to enabling expressive, database-like queries over the information, but many query processing issues still need to be addressed. Furthermore, the tendency to mix traditional data elements with free text in XML, the ability to encode data ranging from fully structured to highly unstructured, and the inherent dichotomy between documents and databases, poses new challenges in combining techniques from database systems and information retrieval.
Thus, for XML to become the all pervasive tool that it seems destined to become, computer scientists still need to solve many problems. This is particularly so in the XML based management of complex systems and services.
Research Scope
The XML related research is focused on a number of specific properties that need to be provided in XML for it to become a better tool in the development and application of advanced database technologies for complex systems and services. These include:
- Quality XML document design
- Mapping between XML and relational databases
- Efficient XML query processing
- XML view support
- Keyword search over XML data
- uncertain XML data management
- RDF databases and semantic web
Research Projects
Constraints in XML Schema Integration (Australian Research Council Discovery Grant)
XML has recently emerged as a standard for data representation and interchange on the Internet. While providing syntactic flexibility, XML provides little support for defining integrity constraints. Integrity constraints are used to ensure the accuracy and consistency of data in a relational database, the database model most commonly used today, that is based on a highly efficient mathematical description of data. Funded by ARC, this project looks into how integrity constraints can be explored and deployed to design high quality XML documents. The oldest and most well studied integrity constraints in relational databases will be revisited and applied to XML and a new normalization theory for XML will be developed. Constraint preserved transformation from relational databases to well structured XML documents will also be investigated.
XML Views of Relational Databases: Semantics and Update Problems (Australian Research Council Discovery Grant)
XML is the standard for representing, publishing and exchanging data over the Internet and relational database is the dominant technology for data management. Updating XML views over relational data is fundamental to bring these two technologies together to serve Internet-based applications. While significant effort has been put on creating and querying XML views over relational data, little work has been done on updating such XML views. Funded by ARC, this project aims to develop a theory on the semantics of XML views and a set of techniques for checking XML view updatability, detecting and handling data redundancy and unintended updates, translating view updates to relational updates, and processing recursive view updates.
Effective Keyword Search for Meaningful and Relevant Entities over XML Data (Australian Research Council Discovery Grant)
Traditional keyword search techniques have been proven user-friendly and effective for searching HTML documents; however, they are far from meeting users’ requirements for querying XML data. The most challenging issue is how to identify meaningful and relevant entities in XML documents. Funded by ARC, in this project, we will propose novel and effective methods for identifying keyword contexts, inferring and ranking returned entities, and develop efficient algorithms for finding meaningful entities and top-k most relevant results. We will also look into the personalised keyword query support. The project will contribute greatly to the fundamental research in XML keyword search, and deliver significant impact on related technology development.
