Articles
SQL for ETL
Page 1 of 3
The incomparable success of relational databases and SQL as tools for the storage and management of enterprise data is well deserved. Almost universally, ERP applications will store their data in a relational model and interact with it using SQL. Unfortunately SQL and relational databases have some significant drawbacks in the context of an ETL solution.
- Metadata. ETL solutions are highly dependent upon quality metadata, but relational databases' metadata typically store only datatype information. Rarely is any 'intelligence' about the contents of columns stored (E.g. analyst comments, cardinality, number of nulls, etc.), leaving business data owners without easy insight into their data. Those databases which do store this information do not make it easily accessible to ETL designers and developers. Some ETL tools, however, do provide access to this meaningful metadata, thereby accelerating and de-risking ETL solution development.
- Standards. SQL is a loose standard, and differences between vendors' variants of SQL mean you might use different syntaxes and functionality for your various source, staging, and target data stores, thus complicating development and increasing its cost. Dedicated ETL tools provide built-in transformation functions to handle common requirements, thereby providing consistency and easier maintenance by using a single language to describe the transformation.
- Performance. Tuning relational databases and SQL queries requires significant skill, and a poorly tuned database or SQL query can incur severe performance penalties on an ETL solution. A well constructed ETL tool-based solution, on the other hand, can often outperform a similar SQL solution.
- Maintainability. SQL is almost always harder to maintain than ETL solutions. As complexity increases and queries get larger and larger, SQL queries can grow out of control and become difficult to maintain - particularly when you're maintaining numerous SQL syntaxes. When using an ETL tool, the self-documenting nature of a GUI development environment delivers a far easier (and hence cheaper) to build and maintain solution than an SQL counterparts.
- Efficiency. When using a relational database for staging data between the E, T, and L phases, or between transformation steps, the sorting and partitioning of data are lost. Downstream phases must re-sort and re-partition the data being retrieved from the relational database, often redundantly. This can add significant overhead to an ETL solution.
- Prev
- Next >>

