Computer Science, asked by vinisaurabh3995, 1 year ago

How to manually copy blocks from one datanode to another hdfs?

Answers

Answered by priyankacute

ur answer....

1. Objective
HDFS follow Write once Read many models. So we cannot edit files already stored in HDFS, but we can append data by reopening the file. In Read-Write operation client first, interact with the NameNode. NameNode provides privileges so, the client can easily read and write data blocks into/from the respective datanodes. In this blog, we will discuss the internals of Hadoop HDFS data read and write operations. We will also cover how client read and write the data from HDFS, how the client interacts with master and slave nodes in HDFS data read and write operations.

This blog also contains the videos to deeply understand the internals of HDFS file read and write operations.

2. Hadoop HDFS Data Read and Write Operations
HDFS – Hadoop Distributed File System is the storage layer of Hadoop. It is most reliable storage system on the planet. HDFS works in master-slave fashion, NameNode is the master daemon which runs on the master node, DataNode is the slave daemon which runs on the slave node.

Before start using with HDFS, you should install Hadoop. I recommend you-

Hadoop installation on a single node
Hadoop installation on Multi-node cluster
Here, we are going to cover the HDFS data read and write operations. Let’s discuss HDFS file write operation first followed by HDFS file read operation-

2.1. Hadoop HDFS Data Write Operation

To write a file in HDFS, a client needs to interact with master i.e. namenode (master). Now namenode provides the address of the datanodes (slaves) on which client will start writing the data. Client directly writes data on the datanodes, now datanode will create data write pipeline.

The first datanode will copy the block to another datanode, which intern copy it to the third datanode. Once it creates the replicas of blocks, it sends back the acknowledgment.

a. HDFS Data Write Pipeline Workflow

Now let’s understand complete end to end HDFS data write pipeline. As shown in the above figure the data write operation in HDFS is distributed, client copies the data distributedly on datanodes, the steps by step explanation of data write operation is:

i) The HDFS client sends a create request on DistributedFileSystem APIs.

ii) DistributedFileSystem makes an RPC call to the namenode to create a new file in the file system’s namespace. The namenode performs various checks to make sure that the file doesn’t already exist and that the client has the permissions to create the file. When these checks pass, then only the namenode makes a record of the new file; otherwise, file creation fails and the client is thrown an IOException. Also Learn Hadoop HDFS Architecture in Detail.

iii) The DistributedFileSystem returns a FSDataOutputStream for the client to start writing data to. As the client writes data, DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the DataStreamer, whichI is responsible for asking the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas.

iv) The list of datanodes form a pipeline, and here we’ll assume the replication level is three, so there are three nodes in the pipeline. The DataStreamer streams the packets to the first datanode in the pipeline, which stores the packet and forwards it to the second datanode in the pipeline. Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the pipeline. Learn HDFS Data blocks in detail.

v) DFSOutputStream also maintains an internal queue of packets that are waiting to be acknowledged by datanodes, called the ack queue. A packet is removed from the ack queue only when it has been acknowledged by the datanodes in the pipeline. Datanode sends the acknowledgment once required replicas are created (3 by default). Similarly, all the blocks are stored and replicated on the different datanodes, the data blocks are copied in parallel.

vi) When the client has finished writing data, it calls close() on the stream.

vii) This action flushes all the remaining packets to the datanode pipeline and waits for acknowledgments before contacting the namenode to signal that the file is complete. The namenode already knows which blocks the file is made up of, so it only has to wait for blocks to be minimally replicated before returning successfully....

click me in thank u.....

Previous Question

Next Question