So in this demo we are going to look at one terabytes of data copy over eight nodes. Okay. So now let us copy the data from map fs into NFS. You look at the screen here. So we are having eight node cluster is running here and CS map cluster we were using. And uh if we look at the nodes all the node details are available here eight nodes we are using and this is the graphana.is running in this uh cluster. Just want to watch the throughput and IOPS and before we start the uh you know just want to look at that Hadoop parameters which we set for this particular uh testing. So we are using 160 GB RAM and virtual core is 160 but actually this server has 32 physical CPU. So like this all the servers has only 32 physical CPU. So just want to see that all the servers just so you look at that all eight servers are only 32 CPUs all the eight servers and what we are just using multi thread so that's why we having here we are having 160 just to match the uh memory of the server and uh let us get into the demo. Now let us uh look at the data copy. So let us get into server. Before we do that, let's clear the space, clear the cache and uh stop all those things in swap memory. clearing the drop cases cache and uh um clearing the memory swap off and on. Yeah. So we are running it will run parallel in all the servers. It's completed. So now log as a macar user I'm clearing the screen. Okay. So now I'm going to run uh let us see that what is this mapper refers file system space just control C that refersuh ls uh ls spaceyou can look at that t space minus h tell me how much is equal close to 1 TB. And the number of files in this H1.3 GB 128 number of files start from zero. Okay. So now let us uh check the uh destination also. Let's clear the space from the destination. I'm just going to clear the destination. Removeall the destination directory.So here we are in BG NFS1 and I also have one more I deleted both of So if you look at that if you look at this where it is mounted here this one is mounted in all eight servershere sorry if you look at that all eight servers it's showing that mounted in all servers see it's mounted in a different IPfor the same flex group volume on a same mount point in all the servers. That's very important to access NFS data from map reduce function specifically for DCP. We need to have same mount point across all the servers. So this is the new feature introduced in um I think 3.x onwards. I also hear that is 2.7 onwards it's available I think. also and next one is it is very useful for the spark workflow that is what the entire industry is going towards now let us run the workflow clear the screen now if I look at here number of nodes I have eight nodes here the font is little small but I just want to show the in a single line so I'm having like this and here I want to show the applications So running applications is nothing is running now. So here also graphana did not much traffic. So let's run the statistics. So periodic in storage.Now let us run the Hadoop uh DCP command and we're using the file course per container hopping from uh mapper RFS to NFS. Let me look at that how the servers are used here. Refreshing the screen here. See number one. Okay server started now.number of servers usage is going on gradually increasing. Let's keep click on increasing now. Let me go to the number of running servers. So this one applications running. Go to application master and if I look at that almost 128 process running it's going on here and it's going on 128 GB. So 128 files we have that's why 128 process running in the background in the running process and if you look at the screen in this side that storage throughput is going on here and then you can see the traffic 8 to 9 GB is going on and the disk read disk right sorry this is DCP copy operation it's disk right also we can look at here and graphize is slowly coming up. It's picking up now. So far close to 4 GB per second is 4,000 MB IOPS to close to 20,000. So here also close to 17 18 27 23,000 IOPS is gradually increasing. Okay. So now let us look at the CPU [laughter] placement on the server RHL1. So we are using net data. So here close to 47 percentage of 46 45%age of CP on that particular servers is used. It's gradually increasing. Andthis number here you can also look into that jobs what jobs going on here. See the copy percentage is showing here. If you can keep on refreshing it you can see that percentage changes. I'm just going back this.So here CPU is busy almost 90 86 percentage almost 80s upper 80s and averagein terms of 80 to 85 percentage and here if you look at the throughput close to six it's going up also now if you look at now 5.9 it's gradually increasing Huh? If you look at that uh resource manager configurations, you can see that details here. The job is completed in 3 minutes 16 seconds. And that NFS operations also completed here. And here if you look at that the high peak is 7.7 GB we got it here. And then this is the throughput we got it of this operation. And then here the job I just refresh you. The job also completed in 3 minutes 7 seconds. So this additional seconds for starting at the job and actual job completed in 3 minutes 7 seconds.

TechComm TV

Hadoop workload on NFS without NetApp In-Place Analytics Module

1 year ago

Learn how to access NFS data from Hadoop workloads such as teragen, terasort and teravalidate without NetApp In-Place Analytics Module.

NetApp In-Place Analytics Module