пятница, 26 декабря 2014 г.

ETL Tool versus Hand Coding

If you are planning DWH project, You should consider ETL Tool.
I found very interesting and useful article of Gary Nissen where he compared two various approaches.

ETL Tool vs Scripting.

Advantages of tool-based ETL

  1. Simpler, faster, cheaper development. The ETL tool cost will make up for itself in projects that are large enough or sophisticated enough.
  2. Many ETL tools have integrated metadata repositories that can synchronize metadata from source systems, target databases and other BI tools.
  3. Most ETL tools enforce a consistent metadata driven methodology that all developers must follow.
  4. Data flow and data lineage documentation can often be automatically generated from a metadata repository.
  5. ETL tools have connectors pre-built for many source and target systems. So, if you are working with many sources or targets, you will find many advantages in a tool-based approach.
  6. Most ETL tools deliver good performance even for very large data sets. If your ETL data volume is really large, or it will be large in a year or two, you should seriously consider a tool-based option.
  7. Some ETL tools help you conduct change impact analysis further down the road.

 Advantages of hand-coded ETL

  1. Automated unit testing tools are available in a hand-coded system, but not with a tool-based approach. For example, the JUnit library (http://junit.org) is a highly regarded and well-supported tool for unit testing Java programs. There are also similar packages for other languages. Another common approach to unit testing is to use a scripting language, like TCL or Python, to setup test data, run an ETL process and verify the results. Automating the testing process through one of these methods will significantly improve the productivity of your QA staff and the quality of your deliverables.
  2. Object-oriented programming techniques help you to make all of your transformations consistent for error reporting, validation, and metadata updates. You may not have this degree of control with the parameter setting and snippet programming style of ETL tools.
  3. Metadata can be more directly managed by hand-coded systems. Actually, metadata driven hand-coded systems are much more flexible than any alternative.
  4. A brief requirements analysis of an ETL system quickly points you towards file-based processing, not database stored procedures. File-based processes are more direct. They are simply coded, easily tested and well understood.
  5. A tool-based approach will limit you to the tool vendor’s abilities and their unique scripting language. But, a hand-coded system can be developed in a common and well-known language. In fairness, all the ETL tools allow “escapes” to standard programming languages in isolated modules.
  6. Hand-coded ETL provides unlimited flexibility, if that is indeed what you need. You can literally do anything you want. There are many instances where a unique approach or a different language can provide a big advantage.
In addition, be aware about you IT guys, who used to change job sometimes, and it is a fact, that hand-code of good programmer very often is impossible to understand. Moreover, hand code is quite difficult to scale.