You are here

What must be represented?

8 September, 2015 - 12:22

The fundamental problems of data modeling include: deciding what entities must be represented within a chosen aspect of reality, what characteristics of those entities must be represented, and how best to balance the solutions to both of the above problems.

For any given aspect of reality, there are many real world objects or abstract concepts that we could choose to represent. This is part of the activity known as data modeling. The decision of what to represent is constrained, in principle, by the limited time and resources that an organization has to put into data modeling. This includes a limited capacity to design a representation scheme, called a schema; and a limited capacity to collect data to populate the data collection defined by that schema.

Practicality – beyond principle – dictates that the design of data management systems should be as simple as possible. As complexity in a schema increases the computational necessary to process it and maintenance requirements necessary to correct and modify it increase significantly [Banker1993]. Thus, it is necessary to decide on a constrained version of reality that we wish to represent.

In data modeling, we will call the basic unit of representation an entity. Sometimes they are called objects – even if they represent non-physical entities. Any entity in the world can be described in terms of one or more characteristics. In data modeling, we call these attributes.

EX. WM-5:

We last chose to represent temperature readings using the following schema:

temperature_readings(date-time, city-name, temperature)

The entities in this case are the individual temperature readings. Each entity is described in terms of the attributes: date-time, city-name, and temperature.

Is it realistic to say that each city has only one temperature at any one time? The concept known as a “city” establishes both real geographic and conceptual boundaries around which we wish to record the real, physical concept of temperature.

We know very well that there may very well be different temperature readings at any one time at different locations within a given city. So which one do we record?

Fredericton is a relatively small town of 131.23 km2 by Canadian standards, thus one temperature reading may suffice. The average resident of Fredericton could easily comprehend the idea of a single temperature reading in relation to the entire city since it covers a small area. That one reading would suffice in informing a resident about whether to wear warm cloths or not.

On the other hand, there are certain data management domains where greater precision in the data is necessary. The operations staff at the airport, for example, will not be satisfied with a reading taken from the city centre. They must have a precise reading at their location since temperature can greatly impact the operations of the aircraft and the working conditions for the ground crew.

This last issue in the Weather monitoring example – the one of precision – is what is often referred to in data modeling as granularity.