Performance, configurability, ease of use & total cost of ownership are some of the primary levers that corporations look for whilst evaluating ETL tools. In February 2008, Microsoft SQL Server Integration Services (SSIS) was used to load 1 terabyte of data in under 30 minutes, a record at the time. That data load, using SQL Server Integration Services, was 30% faster than the previous best time using a commercially available ETL (extract, transform, and load) tool. Subsequently, the potential of SSIS as an ETL tool has progressed a lot year on year.
Speed of data transfer is of great significance as businesses rely on ETL tools like SSIS to integrate large volumes of data stored in heterogeneous systems. ETL tools provide meaningful, consistent, and reliable information to business end users, as they move data between systems.
LEADING SME BUSINESSES PREFER SSIS
According to Gartner’s Magic quadrant for 2021 Microsoft SSIS is now a leader among data integration tools, while it was a challenger in the report a year earlier.
Gartner states that the overall low total cost of ownership (TCO), speed of implementation, ease of use, and the ability to integrate with other Microsoft technologies are frequently the main reasons that the MSBI SSIS data integration tool is chosen over alternatives.
Figure 1 : Distribution Of Companies Using SSIS Based On Company Size (2021) (Source : www.enlyft.com)
SSIS is particularly popular among small and medium size companies accounting for 40% SSIS users. SSIS is popular because it is the standard ETL program used for Microsoft SQL Server. While other ETL tools are available for integrating data between Microsoft SQL Server databases, SSIS is pre-installed and does not require any additional licencing or purchases, making SSIS the go to option for medium scale businesses.
CAPABILITIES OF SSIS
Figure 2 : Representative SSIS Capabilities
- Intuitive UI with capabilities to collect and merge data from various sources makes it easy to build projects.
- Being part of a large Microsoft community SSIS has extensive documentation and support on the world wide web.
- A developer can write custom scripts and even reuse scripts between projects to implement requirements specific to his business needs.
- Seamless and tight integration with the Microsoft family of products.
- Robust error handling allows customised error configuration to prevent and identify breakages.
- It provides BI output with a data transformation procedure.
- Without creating explicit trust relationships between those systems, SSIS can query, filter, and transfer data between databases on separate servers.
- SSIS is a full-featured product with many transformation features that allow for sorting, lookups, using a variety of data sources and targets.
SSIS PACKAGE GENERATION : THE CHALLENGE
“Package” is the unit of work in an SSIS project. A single package in an Integration Services Project is supposed to do one single task, such as : load data into a fact table in the data warehouse, send data from one server to another or make a database backup; basically, any tasks that you want to do with data.
Many organizations have industry standard Data Warehouses (DWH), populated by SSIS packages. In a typical situation, SSIS packages read data from operational databases, change the data, and then write the data to the DWH.
Creating SSIS packages demands significant attention to details through adherence to best practices & standards and developer expertise. Therefore, it can be error-prone and time-consuming. As a best practice, a package needs to be written for each task. The package must be rewritten if there is a change in source schema or metadata.
Developers often need to repeat steps in creating packages. As a high code solution, fixing one thing tends to break others. Inconsistent application of best practices can lead to inefficiencies due to poorly designed source queries, inefficient transformations, slow destination flows or incorrect package configurations.
The biggest challenge is that environments change even when the code is perfect. The data source can be added or altered, and so can the data formats and their business intents.
This could lead organizations to face challenges in:
- Getting Load performance and Managing Data
- Ensuring Data Security and Compliance
- Require additional development efforts, increasing the total cost of ownership
- Enforcing consistent coding best practices to ensure maintainability and performance
Thus, SSIS upkeep could demand entire refactoring and testing of the SSIS solution causing it to be heavy on support and maintenance costs.
From a business viewpoint, creating and maintaining SSIS packages is problematic for many companies without large development and coding staff. Adhering to timelines, budgets, and even start-up or maintenance costs will be high. In addition to the above, inheritance of code across vendors or even between developers tends to increase costs disproportionately, owing to quality issues.
Though ETL with SSIS is a powerful tool, the learning curve for new developers in an inheritance scenario is steep. Custom business scenarios require coding in C# or VB & require experienced developers.
Figure 3 : SSIS Package Generation Challenges
Some of the other challenges associated with SSIS are listed below :
- For novice users, the flexibility and variety of packages and options can be overwhelming.
- Configuring connection managers for online data sources might be challenging.
- It cannot be easy to test out various data flow task options
- Performance tuning is an art form.
- As destination columns are unnamed, mapping destinations using OLE DB commands are challenging.
- Changes in the data structure will result in the extraction failing when importing data from flat files and Excel workbooks. While there are workarounds, they are not always simple to use.
- Unstructured datasets can be challenging to work with. The SSIS tool struggles in handling emerging datasets such as JSON feeds.
- Debugging and troubleshooting logic and errors while running jobs often becomes a challenge, especially when dealing with some types of tasks that are hard to debug.
- Improved handling of data schema changes-If the data source has column changes or additions; these changes are not always implicit for all data conversion tasks in the package.
In ideal scenario for companies will be to not depend on manually authored developers’ source code because they will inevitably break and cause a harmful ripple effect. As mentioned above, companies may risk wasting time and money and be left high and dry if the developers exit the company.
Automating the SSIS package creation is recommended to expedite project delivery, increase productivity, apply industry best practices, and reduce TCO.
FAST TRACK SSIS PACKAGE GENERATION : LEVERAGING MSBI
Elait has developed the MSBI Accelerait framework that uses metadata to automate the generation of single or multiple SSIS Packages. The application is comprehensive in functionality and leverages industry standards & best practices
MSBI Accelerait can be used by any organization to accelerate data migration and ensure on-time data delivery to applications allowing on-time analytics with improved ROI, and minimizing potential loss of revenue.
With MSBI Accelerait you can quickly automate many tedious data retrieval tasks like decompression, decryption, error handling, auditing, etc., and data handling tasks like data & structure validation, sorting, deduplication, lookups, data enrichments etc.
Our MSBI Accelerait Framework can generate 100s of SSIS-Packages in minutes and accelerate data integration within on-premises data warehouses and ETL applications. In order to automate your complete data retrieval and data handling processes using MSBI Accelerait Framework, connect with our Elait experts today!
MSBI Accelerait is easy to set up because it has library components that can be reused. It automates migration to SQL servers and other platforms, governs project delivery, reduces risk, shortens development effort, and accelerates time to market. Since there is no manual creation of SSIS packages, developers are freed up for more productive activities.
MSBI Accelerait provides end-to-end data lineage with a full audit of data sources, destinations, and transformations throughout the data journey.
In a future article, we will introduce Cloud Elevait, an Azure Data Factory (ADF) automation framework that aids in generating 100s of ADF pipelines, automatically checking them into Azure DevOps, Test, Deploy and Execute them within hours. Stay tuned!
Rajeev heads business strategy at ELAIT. He has more than three decades experience in Program Management and has participated in platform and technology adoptions across many corporations. He writes articles addressing challenges faced by enterprises around data.