您在這裡

Identity

9 九月, 2015 - 10:02

In creating a data model, it is almost always the case that multiple entities must be represented and, further, that multiple types of entities must be represented. If entities within a collection of data lack identity it will be difficult, if not impossible, for users to find data they need. It is also important to be able to distinguish between entities of different types.

Giving each entity an identity – or identifier – gives users of a data management system a way of specifying which entities are of interest to them when they ask questions of or perform tasks with a collection of data.

EX. WM-8

In the previous discussion of this the Weather monitoring example, we decided that it is necessary to increase the granularity with respect to geographical locations of the temperature readings that are collected. That is, for a given date-time value it should be possible to record the collection of multiple locations within a single geographic entity such as a city. For example, for the city of Fredericton, we wish to take readings not only at the city centre, but at the airport as well, and perhaps other key locations.

Since multiple readings in this context represent different locations within a geographic entity – in this case a city – we must decide how to distinguish the readings from one another. Our current schema is: temperature_readings(date-time, city-name, temperature). Suppose that on December 4, 2006 at 07:31 in the morning, the temperature at Fredericton city center is 5° C while at Fredericton airport it is 0° C. If we use the current schema to store readings at Fredericton’s city centre and at the airport at the same time (i.e. Date-time), we might naturally decide to record the readings as follows:

<2006-12-04 07:31, Fredericton, 5° C><2006-12-04 07:31, Fredericton, 0° C>

This problem is that this approach does not allow us to distinguish between Fredericton city centre and Fredericton airport. We could change the schema or data values we use to distinguish between each reading. Arguably the value for attribute city-name in the example above should be “Fredericton,” not “Frederictonairport” or “Fredericton city centre,” but this would not help us to uniquely identify each location within Fredericton.

One approach might be to use geographic coordinates as discriminators. Thus, we could change our schema to: temperature_readings(date-time, city-name, latitude, longitude, temperature). Representing Fredericton city centre (66° 32’ W) and Fredericton airport (66° 10’ W), we would then record the readings such as:

<2006-12-04 07:31, Fredericton, 45° 52’ N, 66° 32’ W, 5° C><2006-12-04 07:31, Fredericton, 45° 52’ N, 66° 10’ W, 0° C>

For the same date-time value, we now have two readings, one for the city centre and the other for the airport, which is to the west of the city centre.

Would this suffice? How would the average person know which reading is for the airport and which one is for the city centre? Not many people memorize geographic coordinates in the form of latitude and longitude. Thus, we must find another way to discriminate between locations.

We probably should retain the coordinates, however, because in aviation and other application domains they could be important. One approach is to change the role of the second field in our scheme. We could make its role more general in geographic terms by calling it location-name. Values in this field will now be understood to indicate the names of any type of geographic entity – at least geographic entities that are deemed important in this application domain.

Thus, we could make use of names to such as “Fredericton City Centre” and “Fredericton Airport” in storing temperature readings. This would give us the following:

<2006-12-04 07:31, Fredericton City Centre,
    45° 52’ N, 66° 32’ W, 5° C>
<2006-12-04 07:31, Fredericton Airport,
    45° 52’ N, 66° 10’ W, 0° C>

Would this suffice? Technically, it might work. The geographic coordinates are likely to be associated with only one geographic name. If “Fredericton City Centre” and “Fredericton City Hall” effectively share the same geographic coordinates (45° 52’ N, 66° 32’ W), but we want to allow users to be able refer to either one in looking for temperature readings and assuming we record data using both location names, they could still distinguish between the two readings by the location name. A more common problem, however, is for the same name to refer to multiple entities, particularly cities and people.

What if we measure data for the city of Albany and its airport? There are at least two cities named Albany in North America: Albany, New York (42.6525, -73.75667) and Albany, Georgia (31.57833, -84.15583). Thus, we might be faced with:

<2006-12-04 07:31, Albany City Centre,
    42.6525, -73.75667, 14° C>
<2006-12-04 07:31, Albany Airport,
    42.74806, -73.80278, 14° C>
<2006-12-04 07:31, Albany City Centre,
    31.57833, -84.15583, 37° C>
<2006-12-04 07:31, Albany Airport,
    31.53528, -84.19444, 37° C>

We could address this problem in several ways. Perhaps we could add more attributes to specify locations, such as country, state or province, or county. The problem is that different location name would require different combinations of such location names.

Assigning identities to entities is often necessary for data modeling, but it is often not sufficient. To be useful, an identity must also be unique.