Describing Data
Previous Page | Home Page | Next Page |
---|
Data definition
Data can be defined as facts, figuresor statistics used for reference or analysis. It can be numbers,characters, symbols, images etc. Data must be interpreted, by a humanor machine, to derive meaning. Data is raw material for dataprocessing. Data refers to unprocessed information.
Information is data that has beenprocessed in such a way as to be meaningful to the person whoreceives it. It is anything that is communicated. Information refersto interpreted data.
Data and Information
Most people use the terms data andinformation interchangeably. But data and information are not thesame.
- Data are facts and figures whileinformation is interpreted facts.
- Data is based from records andobservations while information is based on research and calculation.
- Data lack meaning while informationis meaningful.
- Data are valueless while informationis useful and valuable.
Examples of data and information may bederived from the following examples:
- Government collects data aboutpopulation to get information about population and to determinevarious economic and national policies.
- Government collects data from censuswhich can be used to determine the literacy rate in the country.Government can use such information in important decision to improveliteracy rate.
- Input data keyed into the computer isthe raw material given to computer on which computer performs a giventask by processing these data. Information is the outcome of theprocessed data.
- When we fill out a registration formfor admission, it is considered as raw material or unprocessed facts(data), it is then used to maintain a record and create information.
Types of data
Data are of various types, among themmajor ones are-
(1)Character type
(2)Numeric type
(3)Alphanumeric type
Character type are those data whichconsist of only the alphabets, i.e., from A to Z or ‘A’, ‘1’
Numeric data types are those data typeswhich consist only of numbers or integers & the combination ofdigits from 0 to 9.
Example: - 254, 67, 2, 7, etc.
Alphanumeric are those types of datawhich is constructed by the combinations of characters & Numericdata types.
Example: - A79, DD56, 4G6 etc.
Data Dictionary
A data dictionary defines each termencountered during the analysis and design of a new system. Datadictionary is the place where we keep the details of the contents ofdata flows, data stores & processes. Data dictionary is ananalysis tool that primarily records the information content of data.Without a data dictionary the development of large systems becomesdifficult. The data dictionary is an effective solution to theproblem of complicated nature. The main purpose of a data dictionaryis to provide a source of reference in which the analyst, the user,the designer can look up & find out its content and any otherrelevant information.
Examples of Data dictionary –
Student Record =Enrolment Number + Name+ Address +Sex + Date of Birth +Subject
Levels of Data Dictionary:
We can define the data dictionary inthree different levels.
- Data Elements
- Data Structure
- Data Flows and Data Stores
(a) Data Elements
Data elements are pieces of data, whichneed not be broken further. Data element is an atomic unit of datathat has precise meaning. Data elements can describe files, dataflows, or processes. Often a data element is self defining such asStudent name, enrolment no.
(b) Data Structure
Data structure comprises of Dataelements. It is defined as collection of Data elements. For example,let us consider the following “Student Information Record”.
Student Information
Enrolment Number
Student name
First name
Second name
Last name
Student Address
Address 1
Address 2
Address 3
Pin code
Subject Details
Subject 1
Subject 2
Here “STUDENT INFORMATION” is aData structure, made up of Data elements student name, enrolmentnumber. “Subject details” is a Data structure made up of six dataelements subject1 and subject 2.
(c) Data Flows and Data Stores
Data flows are paths along which data travels and Data stores are places where data is stored until needed. So we can say that Data flows are Data structures in motion and Data stores are Data structures at rest.
A data dictionary as we have seen, contains Data elements, Data structures and Data flows and Data stores along with the processes, external entities and glossary of user terms. Processes are defined with number of the tools, such as Flow Chart, Data Flow Diagram, Decision trees and Decision table. In number of applications the users have their own vocabulary and this can be confusing to the analysts and the programmers. In that case data dictionary is a convenient place to keep these glossary items for reference purpose.
Database: it is an organized collection of data, a database is a collection of information that isorganized so that it can easily be accessed, managed, and updated.Database contains table, column, row and field, views etc.
Different terms used in a databaseare:
Relation/ Entity – It means table.
Attributes – It means columns/fieldsin a table.
Domain - This is values within the sameattribute.
Tuple – It means a record or a row ina table.
Primary Key: The primary key (Column)of a table uniquely identifies each record in the table. e.g., RollNo of a student in a class identifies each student. Primary keys mustcontain unique values. A primary key column cannot contain NULLvalues. Each table should have a primary key, and each table can haveonly ONE primary key.
Foreign Key: A foreign key (FK) is acolumn or combination of columns that is used to establish andenforce a link between the data in two tables. A foreign key consistsof one or more columns in a table whose value in one row uniquelyidentifies another row in the same or another table.
Normalization
Data normalization is a formaltechnique for converting preliminary data structures into easy tomaintain, efficient data structures. Normalisation is a processwhereby the tables in a database are optimised to remove thepotential for redundancy and anomalies. It is the process oforganizing the fields and tables of a database to minimize redundancyand dependency. Normalization usually involves dividing large tablesinto smaller (and less redundant) tables and defining relationshipsbetween them.
Example of NF:
Employee Un-normalized Form
emp_no | name | dept_no | dept_name | skills |
1 | Aryaman V. | 201 | R&D | C, Perl, Java |
2 | Bhavana M. | 224 | IT | Linux, Mac |
3 | P. Rohit | 201 | R&D | DB2, Oracle, Java |
(a) First Normal Form
In relational terms, a table is in thefirst normal form (1NF) if it contains no repeating columns. Weshould ensure that a table has no duplication of data in a given row,and that every column stores one information (atomic values) of anentity. Thus, to bring a table into 1NF :
- Eliminate duplicative columns fromthe same table.
- Create separate tables for each groupof related data and identify each row with a unique column (theprimary key).
- Identify each set of related datawith a primary key.
First Normal form (1NF)
emp_no | name | dept_no | dept_name | skills |
1 | Aryaman V. | 201 | R&D | C |
1 | Aryaman V. | 201 | R&D | Perl |
1 | Aryaman V. | 201 | R&D | Java |
2 | Bhavana M. | 224 | IT | Linux |
2 | Bhavana M. | 224 | IT | Mac |
3 | P. Rohit | 201 | R&D | DB2 |
3 | P. Rohit | 201 | R&D | Oracle |
3 | P. Rohit | 201 | R&D | Java |
(b) Second Normal Form
An entity is in the second normal form(2NF) if all of its attributes depend on the whole (primary) key. Inrelational terms, every column in a table must be functionallydependent on the whole primary key of that table.
A table will be in 2NF when we havedesigned the table by:
- Meeting all the requirements of thefirst normal form.
- Removing subsets of data that applyto multiple rows of a table and placing them in separate tables.
- Creating relationships between newtables and through the use of foreign keys.
Second Normal form (2NF)
emp_no | name | dept_no | dept_name |
1 | Aryaman V. | 201 | R&D |
2 | Bhavana M. | 224 | IT |
3 | P. Rohit | 201 | R&D |
emp_no | skills |
1 | C |
1 | Perl |
1 | Java |
2 | Linux |
2 | Mac |
3 | DB2 |
3 | Oracle |
3 | Java |
(c) Third Normal Form
To achieve third normal form (3NF)no attribute must be dependent on a non-key attribute. Thismeans that every informational attribute must be directly dependenton the primary key and not on another column. For 3NF:
- Ensure all the requirements of thesecond normal form and
- Remove columns that are not dependentupon the primary key.
Third Normal form (3NF)
emp_no | name |
1 | Aryaman V. |
2 | Bhavana M. |
3 | P. Rohit |
dept_no | dept_name |
201 | R&D |
224 | IT |
Commonly Used Database: Some ofthe commonly used databases are Oracle, MS SQL Server, Informix,Sybase, DB2, mysql. Based on the choice of the programming languageused and the ability of the software developer, any of the abovedatabases may be used as backend and SQL queries may be used toretrieve information and make any changes in the software. Framing ofSQL query is the biggest tool in the hand of database administratorusing which he can retrieve data and manipulate data as and whenrequired. SQL Query includes DDL (Data Definition Language), DML(data manipulation language) and DCL (Data control Language).
Data Definition Language (DDL)statements are used to define the database structure or schema. Someexamples:
Create - to create objects (table/viewsetc,) in the database
Alter - alters the structure of thedatabase
Drop - delete objects from thedatabase
Truncate - remove all records from atable permanently
Rename - rename an object
Data Manipulation Language (DML)statements are used for managing data within objects.
Select - retrieve data from the adatabase
Insert - insert data into a table
Update - updates existing data within atable
Delete - deletes all records from atable, the space for the records remain
Data Control Language (DCL)statements. Some examples:
- GRANT - gives access and privilegesto user
- REVOKE - withdraw access privilegesfrom the user.
Previous Page | Home Page | Next Page |
---|