Liming Sun

Assistant Professor

Big data expresses the data Model of ETL and Storage Management


ETL (extract extraction, transform transformation, load loading). ETL is responsible for extracting the data from scattered and heterogeneous data sources, such as relational data and plane data files, to the temporary middle layer, cleaning, transforming, integrating, and finally loading them into data warehouse or data Mart to provide decision support for on-line analytical processing and data mining.

ETL is an important part of building a data warehouse. Users extract the required data from the data source, clean the data, and finally load the data into the data warehouse according to the predefined data warehouse model. The source of its definition domain is not less than ten years, and the technological development should be quite mature. But at first glance, it seems that there is no technology to speak of, and there is nothing esoteric, but in the actual project, it often spends too much manpower on this link, and in the later maintenance, it often takes more brains. The above reasons are often due to the lack of a correct estimation of the work of ETL at the beginning of the project and the lack of serious consideration that it has a lot to do with tool support.

When doing ETL product selection, it is still necessary to face four points (cost, personnel experience, case and technical support) to consider. In the process of doing ETL, it also comes from some ETL tools, such as Datastage, Powercenter, ETLAutomation. In the comparison of the actual application of ETL tools, we choose the starting point from the aspects of metadata support, data quality support, maintenance convenience, custom development function support and so on. A project, from the data source to the final target table, can range from hundreds of ETL processes to as few as a dozen. The dependencies between these processes, error control, and recovery process handling are all key considerations for tools. There will be no more discussion here, and the specific application will be explained in detail.