BlueXP is now NetApp Console
Monitor and run hybrid cloud data services
Hello everybody, thank you for joining this session. My name is Karthik Nagalingham. I am a principal architect for big data analytics and XCP product. Today we are going to talk about unified data link solutions. In this case now in this case data is the key for most of the business. Nowadays the data is available in a different form structure unstructured semistructure format because lot of insights coming from semiructures unstructured data like logs videos. So those are the informations are very important for running a business. So for that purpose those datas are kept in the distributed file system like Hadoop mapar file system, HTFS file system or GPFS file system. So those are the distributed file system to process those structured, unstructured and semistructured data. So what happen is in the earlier days map reduce are used to do the functionality. Nowadays the same data has lot of business insight. In that case the customer wants to migrate and move the data from the HDFS file system into unified data lakeink locations. So for that NetApp provides a product called XCP and cloud sync through which we can migrate the data from unified data links from the distributed file system to the unified data lake locations. So in this case the data can be in on premises the data can be in the uh real time or in the cloud or for all these three things we can migrate the data. So why the customer is moving the data is for the customers data scientist want to run the analytics artificial intelligence on top of that analytics data to bring up the to build the model as well as they do the predictive analytics and data discovery operations they are doing on top of this data using machine learning artificial intelligence and high performance computing. So I just want to share some of the customer experience. So here the first customer is banking customer does that customer want to migrate the data from thousand node mapar cluster into netup on topap to do the artificial intelligence on GPU workloads on GPUs. So another customer is another customer also kind of banking customer they are they want to access the same data for spark SQL kind of workload through file as well as S3 protocol. So Netab on provides a file access for the data same thing we are providing to that customer through different opensource protocol through opensource product to access a similar data through S3 protocol. So now on also working on that DT protocol for similar files similar data. Next one is S3 access. Why the customers are using unified data lake through S3 protocol? Because S3 and S3A protocol are the industry standards to access object storage nowadays. So say for example confluent Kafka customer partner they are accessing the object storage for different functionality say for example taring. So tearing is in the past they use only for the archival purpose but now real time online streaming there is realtime streaming also can be done in the uh taring functionality is required say for example confluent and Kafka we are certified with conflu and Kafka because the Kafka customers the Kafka partner they are thinking that is storage can provide better performance like net provide very good performance and it is cost effective. Second thing is the same data can be unifi unified access by different applications. How the applications are accessible through connectors and through API access. Say for a connector perspective we have tested Amazon S3 sync to access the object storage data for Kafka workload and different applications. So the another important thing is API level access for the data. So what happened here is the customers immaterial of the underlying protocol they want to access the data through API access. So NetApp provides API access to the similar data. Another one is customers are asking that is why we need to have the unified data lake important thing is security and scalability. So they can scale up and scale down anytime based on the requirements and the security is important key factor to access the data. So nowadays data lake unified data lake accessed by the spark workload map reduce and high hatch kafka workloads. So those are important workloads accessing the data from the unified data lake from especially from the object storage. So we have detailed explanation about these solutions in this website. So the net hyphen solutions website has more details about the unified data lake. Thank you.
Data is key for business success and is available in many forms. Create a single source of truth for your data to feed modern analytics, AI/ML training, and business intelligence.