Where Big Data tools like Hadoop and Spark comes into picture when we talk about ETL

最后发表

RSS

Sathish Kumar

(@sathish)

Member Moderator

Joined: 1 year ago

Posts: 1391

16/03/2021 10:03 am

I am working on Hadoop for the last 4 months. So now I'm very curious to know that where ETL tools are used in the case of Big Data tools like Hadoop and Spark and for what purpose?

Quote

Anamika

(@anamika)

Noble Member

Joined: 1 year ago

Posts: 1381

16/03/2021 10:05 am

When we talk about ETL, ETL means to extract, transform & load (ETL)

一个典型的ETLπpeline consists of a data source, followed by a transformation, used for filtering or cleaning data, ending in a data sink.

So in the case of Hadoop and Spark, an ETL flow can be defined as:

Data is coming from various sources such as databases, Kafka, Twitter, etc.

To get some meaningful insights we need to filter out or clean the data using Spark, MapReduce, hive, pig, etc.

Finally, after processing(transformation) the data, it is stored in a data sink such as HDFS, table, etc.

Hope this will help you.

ReplyQuote

Forum Statistics

14 Forums

2,745 Topics

5,490 Posts

1 Online

6 Members

Latest Post:How to get access to XHR data using python script?Our newest member:kobymccree85429Recent PostsUnread PostsTags

Forum Icons:Forum contains no unread postsForum contains unread postsMark all read

Topic Icons:Not RepliedRepliedActiveHotStickyUnapprovedSolvedPrivateClosed