This process is not as easy as it may sound. In order to put everything together there are two main conditions which have to be fulfilled. Firstly, there must be no redundant data. This means that no two entries of the total data base must resemble (if we use databases, of course). Even if we don't use a database to store the information there must be no redundancy. The second most important condition is that specific entries are easily retrievable. This means that if I am looking for an employee with a certain name it shouldn't take more than a few milliseconds to find him or her. The actual consequence in structure is that the data needs to be very well organized.
The data is widely available nowadays. We can speak of scientific data and we can speak about commercial data. Both of them are now available in large quantities from multiple sources of information. In order to make these data usable one needs to integrate it. Data integration is a process by which the data is combined. All the sources provide various parts of the whole data. Through data integration the data is put together.
In order to provide the qualities requested by the data integration process a very often used tool is the data dictionary. The data dictionary is practically a way of organizing all the entries. To understand this just think of the way you look up a word in a dictionary. There are thousands of words in there. But yet you are able to retrieve it without parsing all of them. How are you able to do this? You simply look at the first letter of the word and open the dictionary where the corresponding letter lyes. Then you move to the second and so on. The data dictionary works the same way.
0 comments:
Post a Comment