Data and business have become inseparable. The absence of data-driven decision-making limits an organization’s ability to fully capitalize its data. Sensitive business and consumer data can be easily exposed to security threats if proper data management strategies are not in place. The prime consideration and resolution to manage these challenges is to have a robust data governance framework. This framework provides a set of policies, procedures, and tools to manage and protect an organization’s data assets. It can help ensure that data is accurate, complete, and consistent, while also protecting it from unauthorized access or breaches. Amongst the categories that define data governance programs, two key components that play an important role in this are Data Dictionary & Data Catalog.
Approximately 32% of UK businesses reported cyber security breaches or attacks in 2019, with an average cost of £4,180 in stolen or lost data and assets. The percentages are much higher for bigger targets like medium businesses (60%), large businesses (61%) and high-income charities (52%).
In this blog post, we will explore how a data governance framework, with data dictionary and data catalog as its two important building blocks, can help organizations improve data security and data quality. We will also discuss the benefits of implementing these building blocks and the steps organizations can take to ensure they are effectively implementing data catalog and data dictionary in their data stack.
Data Dictionary: Definition, Importance & Benefits
A Data Dictionary, also known as a Data Definition Matrix, defines corporate data items, their meanings, and permissible values. A Data Dictionary will describe each business idea characteristic, whereas a conceptual or logical Entity Relationship Diagram will concentrate on high-level business concepts.
It helps you convey business stakeholder needs to your technical team so they may more readily create a relational database or data structure to match those requirements. It prevents implementation errors leading to expecting information from business stakeholders they can’t deliver or anticipating accurate data during analytics stage.
Take an example of a financial company which had its customer profile information along with financial data stored in several places throughout the bank’s infrastructure. In order to ensure data consistency and accuracy, the organisation made use of data dictionary, that described the properties of its master data elements and how it is related to the others. By using Data Dictionary, they could find relevant information about data, its history and through appropriate rules for accessing it.
By adopting data governance policies and implementing the data dictionary, the bank was able to improve :
- Data discovery and accessibility for its analysts and traders
- Better data-driven decisions and improve its financial performance
- Improved data security and privacy
- Healthier compliance with regulatory requirements
- Enhanced control over reputational damages
According to a survey conducted by this bank, the use of data dictionary increased the data quality by 80% and reduced the data discovery time by 60%. It also helped to increase the compliance with regulatory requirements by 90%.
Overall, the bank’s use of a data dictionary was a key factor in its ability to effectively leverage its data assets to support business objectives and maintain a high level of data governance.
The case study demonstrates why it is essential for businesses to have access to a thorough Data Dictionary as part of their data governance. This allows the company to locally store all of the information it has on its data assets, which improves its efficiency and effectiveness when using data to achieve its goals and remain in compliance with applicable regulations.
Data Catalog : Definition, Importance & Benefits
The value and benefits of a data catalog are often described as the ability for analysts to find the data they need quickly and efficiently. Data Cataloging accelerates analysis by minimizing the time and effort that analysts spend finding and preparing data. Anecdotally it is said without a data catalog 80% of effort is spent getting the data ready for analysts. Using the data catalog can cut that percentage from 80% to less than 20%. Although there is a high degree of truth in this anecdotal view, it is insufficient to build a business case for technology investment.
In the context of data governance, a data catalog helps organizations manage, organize, and understand their data assets. It provides a centralized repository for storing and managing metadata, which includes information such as data source location, schema, data quality, and business definitions.
Data catalogs can be used to :
· Discover and understand the organization’s data assets
· Improve data accessibility and usage by different departments and teams
· Ensure data quality and governance by providing a single source of truth for data definitions and lineage
· Increase data-driven decision making by providing a central location for data discovery and self-service access.
A study conducted by Gartner found that organizations without a data catalog spend an average of 30% more effort on data-related projects than those with a catalog. This is because data silos can make it difficult for organizations to access and utilize their data effectively, leading to additional costs for data-related projects.
For example, a company without a data catalog may have multiple teams working on similar projects, but each team is using different data sets and has different definitions for key data elements. This can lead to delays and increased costs as teams spend time and resources trying to reconcile their data and ensure it is accurate and consistent.
In contrast, a company with a data catalog can more easily access and share data across different teams and departments, resulting in less duplication of efforts and a more efficient use of resources. This can lead to significant cost savings on data-related projects, as well as improved data quality and a more complete view of the company’s data.
Additionally, a data catalog can also provide an inventory of the data assets and their lineage for the organization which can help in better understanding of data and reduce the effort to find and access the relevant data for the project which ultimately can lead to reduced project cost.
INTERCONNECTION OF DATA CATALOG AND DATA DICTIONARY
People are starting to use data catalogues more and more, but they are often confused with a data dictionary. To be more specific, what is their connection to Data Dictionaries?
In the context of data governance, a data catalog and a data dictionary can be interconnected in several ways to aid in the management and understanding of data assets within an organization.
An example of how these two are interconnected in practice is that, a data catalog is used to store metadata about the customer data in a retail organization. This metadata includes information such as the data’s source, the format of the data, and the data’s lineage. The data catalog also contains links to the data dictionary, which provides detailed information about the structure and meaning of the customer data, such as field names, data types, and constraints. This information can be used by data governance team to ensure that the data is being used correctly and that data quality is maintained.
In summary, both data catalog and data dictionary play important role in data governance, the data catalog provides metadata about the data assets and data dictionary provides detailed information about the data structure and meaning, and together they can be used to ensure data is accurate, complete, and used correctly within the organization.
INSTANCES OF UNDEFINED DATA DICTIONARY AND DATA CATALOG
When data catalogs and data dictionaries are not defined, organizations can face a number of challenges. Let’s take an instance of an insurance company which has not yet implemented data catalog and data dictionary. In this case, the policyholder data, claims data, and financial records, etc would likely to be stored in various databases and file systems across the organisation with little to no metadata or documentation about the data’s structure and meaning. Due to this users would have no central place to find and access the data, and it would be hard to tell where and how it was used.
The company would also struggle to maintain data quality without a data dictionary, as there would be no documentation of data constraints and rules. This could cause data errors and inconsistencies, which could hurt the corporation. The company would also struggle to comply with data governance and management regulations like GDPR and HIPAA without a data catalogue and data dictionary.
In this case, a data catalogue and data dictionary would have helped the insurance company manage and organise its data assets, improve data quality, and comply with regulations.
DEFINING DATA DICTIONARY AND DATA CATALOG: THE CHALLENGE
Defining a data catalog and data dictionary can be a major challenge for companies, as it requires a significant investment of time and resources as maintaining a data catalog and data dictionary is an ongoing process that requires regular updates and reviews to ensure that the information remains accurate and relevant. This also requires a proper governance organization structure and dedicated resources.
The process of creating a data catalog and data dictionary involves identifying all relevant data sources, reviewing, and cleaning the data, and defining clear and consistent business and technical terms for the data. For a wide range of reasons, it is crucial to define all data in a data dictionary or data catalogue using both business and technical terms. Some of them are as follows:
- Clarity: Defining data in both business and technical terms provides clear, understandable definitions for stakeholders from both business and technical backgrounds.
- Consistency: A data dictionary or catalog ensures that data is defined consistently across the organization, which helps to prevent confusion and errors.
- Governance: A data dictionary or catalog can be used to enforce data governance policies, such as data quality standards, access controls, and retention policies.
- Discovery: A data dictionary or catalog can be used to discover and understand the data that is available within an organization, which can help to improve decision-making and support data-driven business processes.
- Compliance: A data dictionary or catalog can be used to document data elements and their use cases to ensure compliance with regulations such as GDPR, HIPAA, and SOX.
- Traceability: A data dictionary or catalog can be used to trace the data lineage and understand the origin of data, and how it is used, stored and protected.
Conclusion :
In conclusion, data catalog, data dictionary are key data governance components and helps in unlocking the full potential of an organization’s data. They provide a comprehensive understanding of data assets, their structure, and the information needed to effectively manage, govern, and utilize data.
A data catalog acts as a central repository for all data-related information and a data dictionary, on the other hand, acts as a blueprint for data within an organization, providing definitions and constraints for each field, and outlining the relationships between different data sets ensuring the consistency and accuracy in data entry and allows for better understanding of the data structure.
Proper data governance involves establishing clear roles, responsibilities, and policies for data management, ensuring compliance with regulations, and maintaining data quality. By implementing data catalog and data dictionary organizations can effectively govern and utilize their data, leading to better decision making, enhanced productivity, and improved compliance.
Data cataloguing and data dictionary are very crucial keys to unlocking the full potential of an organization’s data and organizations should invest in them to effectively manage and govern their data.
The Data Governance Program office at Elait supported an insurance company manage the product schemes centrally and to more effectively leverage the BI applications capability by implementing data dictionary and data catalogs. This ensured better data consistency and single version of product information supporting administration, financial reporting, quotations, group customer data & digital servicing.
At Elait our expert consultants at the Data Governance Office partner with our customers to facilitate adoption of best practices, policies, processes, and frameworks to establish the most optimum Governance norms for running an efficient Data Managed corporation.
Reference
https://docs.aws.amazon.com/whitepapers/latest/enterprise-data-governance-catalog/data-governance-catalog.html
https://www.eckerson.com/articles/the-business-case-for-a-data-catalog#:~:text=The%20value%20and%20benefits%20of,spend%20finding%20and%20preparing%20data.
https://data.world/blog/data-catalogs-vs-data-dictionary/
https://randomtrees.com/blog/importance-of-data-dictionary-as-part-of-data-governance/
https://www.precisely.com/blog/datagovernance/why-you-need-a-data-catalog#:~:text=A%20data%20catalog%20is%20essential,an%20easy%20to%20digest%20format.
https://www.capitalone.com/digital/facts2019/
https://www.talend.com/resources/what-is-data-catalog/
https://www.gartner.com/en/documents/3957301
https://s3.amazonaws.com/eckerson/content_assets/assets/000/000/199/original/The_Business_Case.pdf?1537981646
https://www.bridging-the-gap.com/erd-entity-relationship-diagram/
Rajeev heads business strategy at ELAIT. He has more than three decades experience in Program Management and has participated in platform and technology adoptions across many corporations. He writes articles addressing challenges faced by enterprises around data.