You are here

What are data?

9 September, 2015 - 09:57

A datum is either a mathematical quantity, a set of symbols, or some combination of the two that is used to represent a fact. Datum is the singular of data. Facts that are represented by data may be natural objects or phenomena, human-derived concepts, or some combination of the two. Several fundamental issues are involved in using a datum in data management:

Relationships: What relationships exist between a datum and other data?

  • Form: In what form must a datum exist?
  • Meaning: What does a datum actually mean?

First, to be useful, a datum is usually managed in relation to some other data. For example, it is usually not useful to record and manage someone’s name only. We are more likely interested in the relationships between that person – as represented by their name – and other data, such as their address, their telephone number, or anything else that is necessary for a particular set of data management tasks.

Second, the form in which a datum is presented carries a lot of meaning. In fact, it is often the case that most types of data must be presented in certain formats in order that they are understood. Standard conventions are usually developed for the visual layout of a given data type. Consider the conventions that apply to the presentation of telephone numbers, addresses, dates and times. One difficulty in managing data is that many conventions vary by region and country. For example, 2.5 hours past noon is represented variously as “2:30 pm”, “14:30”, or “14 h 30”.

Third, what a datum might mean is the topic of next section.

EX. WM-1 :

Current measurements determine that the quantity -10 in the Celsius scale represents the temperature outside the author’s window as he writes this. To make sure that this quantity is understood to be a temperature value and not something else, we may wish to add a symbol indicating the unit of measure to be degrees. The ‘°’ symbol has become customary for this. Thus, we might have -10°. Further, in a world with different temperature scales, it is important to specify on which scale our quantity is to be interpreted. We would customarily add “C,” to indicate the Celsius scale. Thus, we have “-10° C.”

The datum -10° C does not make sense to us unless we are given a context such as the time and location of the reading. Thus, we will likely want to keep temperature in relation to a number of other data quantities or symbols. One representation that we might record is the following grouping of data <2006-12-04 07:31,Fredericton, -10° C>, where date and time, city name, and the temperature are all represented together. In data management, this type of grouping is called a tuple.

The characters “° C” when presented together with a numeric value help us understand that the datum -10 is a temperature reading. Likewise, the other characters present in the tuple help us realize the domains to which each other datum belongs: date/time, city, and geographic coordinates.

Question for the reader:

Why do we need to represent the city in the form of its name when we have its precise geographic coordinates?

EX. MVR-1 :

Governments have a need to identify motor vehicles on their roads. This is normally done by assigning to each vehicle a unique identifier, a unique set of symbols. A motor vehicle identifier is affixed to vehicles on license plates – its presentation format – and stored within some data management systems – its storage format – along with other information, such as the owner, make, and model of the vehicle.

A common format is %c%c%c %d%d%d, where %c represents a single alphabetical character and %drepresents a single digit. So an example license plate code would be:

ABC 123

A clear pattern of characters separated by spaces is present in the presentation format of our license plate schema.

Questions for the reader:

  1. Should the presentation format of a datum be the same as its storage format in an information system?
  2. What value does the space have in the presentation format?