Notifications
Clear all

Where Big Data tools like Hadoop and Spark comes into picture when we talk about ETL

RSS

(@sathish)
Member Moderator
Joined: 1 year ago
Posts: 1391
16/03/2021 10:03 am
I am working on Hadoop for the last 4 months. So now I'm very curious to know that where ETL tools are used in the case of Big Data tools like Hadoop and Spark and for what purpose?

Quote
(@anamika)
Noble Member
Joined: 1 year ago
Posts: 1381
16/03/2021 10:05 am

When we talk about ETL, ETL means to extract, transform & load (ETL)

一个典型的ETLπpeline consists of a data source, followed by a transformation, used for filtering or cleaning data, ending in a data sink.

image
So in the case of Hadoop and Spark, an ETL flow can be defined as:

Data is coming from various sources such as databases, Kafka, Twitter, etc.

To get some meaningful insights we need to filter out or clean the data using Spark, MapReduce, hive, pig, etc.

Finally, after processing(transformation) the data, it is stored in a data sink such as HDFS, table, etc.

Hope this will help you.


ReplyQuote
Share:
Baidu