Data conversion is the conversion of computer data from one format to another. Throughout a computer environment, data is encoded in a variety of ways. For example, computer hardware is built on the basis of certain standards, which requires that data contains, for example, parity bit checks.

Data validation means checking the accuracy and quality of source data before using, importing or otherwise processing data. Different types of validation can be performed depending on destination constraints or objectives. Data validation is a form of data cleansing. 

Why perform data validation?

When moving and merging data it’s important to make sure data from different sources and repositories will conform to business rules and not become corrupted due to inconsistencies in type or context. The goal is to create data that is consistent, accurate and complete so to prevent data loss and errors during a move. 

When is data validation performed?

In data warehousing, data validation is often performed prior to the ETL (Extraction Translation Load) process. A data validation test is performed so that analyst can get insight into the scope or nature of data conflicts. Data validation is a general term and can be performed on any type of data, however, including data within a single application (such as Microsoft Excel) or when merging simple data within a single data store.

01

Data Type Check

A data type check confirms that the data entered has the correct data type. For example, a field might only accept numeric data. If this is the case, then any data containing other characters such as letters or special symbols should be rejected by the system.
02

Code Check

A code check ensures that a field is selected from a valid list of values or follows certain formatting rules. For example, it is easier to verify that a postal code is valid by checking it against a list of valid codes. The same concept can be applied to other items such as country codes
03

Range Check

A range check will verify whether input data falls within a predefined range. For example, latitude and longitude are commonly used in geographic data. A latitude value should be between -90 and 90, while a longitude value must be between -180 and 180. Any values out of this range are invalid.
04

Format Check

Many data types follow a certain predefined format. A common use case is date columns that are stored in a fixed format like “YYYY-MM-DD” or “DD-MM-YYYY.” A data validation procedure that ensures dates are in the proper format helps maintain consistency across data and through time.
05

Consistency Check

A consistency check is a type of logical check that confirms the data’s been entered in a logically consistent way. An example is checking if the delivery date is after the shipping date for a parcel.
06

Uniqueness Check

Some data like IDs or e-mail addresses are unique by nature. A database should likely have unique entries on these fields. A uniqueness check ensures that an item is not entered multiple times into a database.

Practical Example

Consider the example of a retailer that collects data on its stores but fails to create a proper check on the postal code

The oversight could make it difficult to leverage the data for information and business intelligence. Several problems can occur if the postal code is not entered or entered improperly.

It can be difficult to define the location of the store in some mapping software. A store postal code will also help generate insights about the neighborhood where the store is located. Without a data check on the postal code, it is more likely to lose the value of data. It will result in further costs if the data needs to be recollected or the postal code needs to be manually entered.

A simple solution to the problem would be to put a check in place that ensures a valid postal code is entered. The solution could be a dropdown menu or an auto-complete form that allows the user to choose the postal code from a list of valid codes. Such a type of data validation is called a code validation or code check.

.

Data Validation in Excel

The following example is an introduction to data validation in Excel. The data validation button under the data tab provides the user with different types of data validation checks based on the data type in the cell. It also allows the user to define custom validation checks using Excel formulas. The data validation can be found in the Data Tools section of the Data tab in the ribbon of Excel:

Additional Resources

Thank you for reading CFI’s guide to Data Validation. In order to help you become a world-class analyst and advance your career to your fullest potential, these additional resources will be very helpful: