BlueXP is now NetApp Console
Monitor and run hybrid cloud data services
Okay. Uh hi everybody and welcome to the call. Thanks for joining uh this first episode of our NetApp partner tech series. This is a series of calls that we used to run uh some time ago. It was verypopular but it just uh stopped for a v variety of reasons and now we'd like to bring it back. So this is kind of the first of our new set of hopefully monthly technical uh sessions for our partner community. Um my name is Gavin Moore. I'm the CTO for AMIA and Latin America here for NetApp and I'm joined by uh Neils who will be doing um all of the talking on this uh call. Uh he is he's our speaker today and we also have Patrick and Aunt who will be helping out with Q&A. So please do if you have any questions um at all no matter how technical the guys love a challenge. Please do uh put your questions into the Q&A and we'll answer them during the call or at the end depending onhow we get on for time. H as this is a new series, we'd love to get your feedback. Please tell us what you think. At the end of this of the session, when thecall ends, you'll be sent to your browser and there's a veryquick survey. It'll just take seconds to complete. Tell us what you think of the session. H and also uh we'd love to hear from you if there's technology topics that you'd like us to talk more about on these sessions. Um the sessions are here for you. So, uh, please do let us know what it is that you'd like to hear about and then we will, uh, work the sessions around what it is that you guys are all asking for. Uh, with that, uh, I'll hand over to Neils. Over to you, Neils. >> Perfect. Thank you, Kevin. Um, apologies upfront to everybody. I seem to have caught a cold, so if my voice doesn't carry over that strong as usual, and if I, uh, need to cuff here and there, uh, I beg your pardon. Um, but we'll get through this. So Zoom and microphones will help me with this. So um my name is Neils Neil. I'm a solution architect based in Frankfurt Germany and uh my focus topics uh are business continuity. So the one or the other may have already had contact with me uh in regards to metrocluster net mirror and I'm also covering our yeah block solution. So anything related to SAN and that is our topic today. We will be talking about SAN the ASA in particular. So uh we already heard the introduction from Gavin. So now we're talking a little bit more about the ASA. So well how does the ASA fall into what we initially actually invented the unified storage and what is the story around how does it fit um then a brief introduction to the all san array what it is I hopefully you all have heard about it so some of that content you may have already heard [snorts] on other occasions that's just because we really want to make this work we really want to be present in customer sand environments and that is why we talk about talk so much about sand that's not that we don't continue to do the other stuff but uh we will continue talking about sand and some other features and functions uh maybe even in this series uh I can see another topic being snap mirror active sync perhaps that we might be talking which is also sound specific um one or some points that I would like to add and maybe that's something youhaven't heard inthat detail is really around the positioning uh wheredo we want to go with this and which customers do we want to uh engage and obviously as this is a technical call some technical details. Um pardon me uh if this might not be as technical as you might think or if it's more technical than you think. So again back to what Gavin said, there is a survey at the end. So please provide us feedback on how deep you want us to go, what you want to hear and if you want to have a deep dive on some other product, please let us know. So take this as a start and then we will roll from there and see that we can cater to your needs. Q&A is listed here as last but obviously I would like to have the Q&A when things pass. So when I'm talking about something, I have a slide on and you have a question, please type your question in Q&A. Uh andPatrick are there. Um they can either uh answer them directly. Maybe it makes sense that I talk about it uh that particular question um not just in a brief way but you know call it out. So uh Patrick and aunt are uh welcome to interrupt me also at any time and we can go through those uh questions directly when you have them. So the NetApp ASA the all flash Xanay first episode of our NetApp partner tech series. Um let's begin and talk a little bit about unified storage and how does an all sun array really fit in this story.When we are looking at our portfolio and really notthe whole portfolio really just on we really have a good track record of how we developed onep and what we want to have it to do in the data center. We want it to be everywhere. We want it to be catering for almost any application and it can um no matter which uh new application virtualization, no matter which protocol, file, black uh block or object. So everything is there and that is what we initially well was 30 years ago or when we introduced block and file really was the invention of unified storage. So having this one box be more specific this one operating system that can cater for all the needs. So that is still our idea you know maintain ONAP develop ONTAP as the industry-leading storage operating system. It currently is, but we still want to continue. Um, and no matter where we are running on tab, be it on premise on the unified AFF storage system, the all sand arrays in the cloud as first party or third party service, we just want to make onep successful. So then we have this broad portfolio of hardware platforms. So SAN is usually an onremise play. So we are really talking about hardware most of the time and uh as you can see here five different families hardware families that can cater for any need that you can think of. So we still maintain our faz hybrid flash series. Um we introduced the capacity flash series and we still have the performance flash series for the uh yeah most demanding workloads at our customers. But when we look at the lower row we have these new additions nottoo new actually the middle one is quite new. So the all san meant just to cater for block workloads and uh the C series uh was very successful when we introduced it and so we also decided that we will have an all sand variant of that C series. So now we if we see that we have five different hardware platforms some of which are truly unified in a protocol kind of way the others are meant to serve block protocols only. So how does that fit into the whole idea of unified storage? And I think unified storage is evolving while the pure definition of unified storage you know getting all your data services object file and block out of a single box which is the nature of unified there is much more to it than just having everything out of the single box.So while we can do that it's much broader than that. We can provide the same services from hardware. So an on-remise installation as well as in either of these three hyperscalers where we are present and the whole idea with this data fabric which is also a couple of years old now. is this idea of our data fabric. This really brings the idea to um have the same data management capabilities across all these instances. So we are unifying way further than just in a single box. With this concept, customers have the possibility to maintain, secure, protect, and automate their data and their management regardless of which platform it is, regardless of where their services are running. And that is what I would call thenew way of looking at unified storage.And with that in the end it doesn't really make a difference if the ONEP system that I have running is running block protocols only or maybe NAS protocols only because that may be by choice by customers that they want to separate workloads but they still fall under the same umbrella. They still have the same management pain and you can interact with them interchangeably. So while we have hopefully cleared that up. So how does an all san fit in our unified story? Let's talk a little bit about the all san. So the all san um has initially been introduced for the very first time in 2019. Um so it's not that new. Um, but weI think NetApp struggled a littlebit by ourselves on howdo we want to deal with the A series uh the ASA A series um compared to an AFF at that point in time and uh we thought about that and we reintroduced it uh in 2022 if I'm not mistaken and uh since then it has been quite popular. wehave adjusted a little bit especially around um positioning and especially around pricing and the combination of things. So now we have two families in the ASA. So which is the ASA A series. So high performance, low latency. And the newest addition to the family is the C series where we have the let's say the best of both world or both worlds. The idea why the C series uh was born. um still having consistent flash performance but at a much lower price point being more attractive insome of the workloads where all flash is considered to be necessary but these high performance um a series systems or equivalents fromcompetitors are simply perceived to be overpriced. So there is some marketing around ASA certainly I'll not dive into that too deep but I still wanted to give you context and as you get the slides afterwards that you have all the materials I still wanted to um you know say some words uh around it. So we have these programs that you hopefully all have heard about already. Uh we are talking about our efficiency guarantee with the 4:1 efficiency for sand workloads. Then uh we say it's secure and protected and we underpin that with our um autonomous ransomware protection or um ransomware recovery guarantee to be more precise. Um and uh some uh other marketing claims around it. So these are very important. Um we are very confident around these. uh we are so confident that we are putting money on the table when we don't um fulfill this guaranteed feature then um then thecustomer is eligible to um sorry I'm missing the words um yeah actually toget something back to get paid by us and so we are very serious about that while these are not technical items I think these are very important because these help us to get into conversations with customers and talk to them that we are very serious about sand and that we are a good platform and a good vendor to deal with inSA environments. So these two variants already mentioned still exist. So we have the A series which is the submillisecond latency high performanceum yeah actually power horse so to say. So anything that is really latency critical is really still meant to run on an ASAA series. uh the C series uh has turned out to be pretty well suited for um any tier 2 or let's say general purpose workloads um broad VMware virtualization environments because we it still provides a good uh performance uh sustained performance um just not at this sub millisecond level. So we have seen that it uh can also successfully uh be a tech refresh platform for other all flash technologies um and u still be uh good enough for uh even some of the first uh tier one first class um applications at customers. The all SAN array really is all about block protocols and that includes the old ice scuzzi fiber channel SCSI based protocols as well as the newer ones the NVME based ones NVME over fabric and NVME over TCP. So all these four protocols are supported with the ASA and the advantage that we can also bring to a customer is a smooth transition from the old world to the new world. Recently when I you know reached out to partners and internal account teams alike, I just wanted to know and understand of where are customers in adopting NVMe. And although it is pretty dominant in RFPs, I think all customers want their new storage array to support NVMe. We really don't see that high of an adoption.But that is not a bad thing because we have a very good story around it because it means that we can still get customers with the old scuzbased protocols run on an ASA and an AFF alike. Actually, it's an ONTP feature, not an ASA feature, but still giving them the opportunity to move from the old world to the new shiny NVME protocols at their own speed. Meaning they can decide on when to pick an application and only that application and port that to NVMe either because they can now provision um SCSI and NVMe out of the same box but as well as you know transitioning from the old to the new world where I have an application that I simply have a maintenance window shut it down disconnect the L from my server convert the scuzzy based L into an NVMe namespace and reattach it to my server and then uh just start my application and within that single uh maintenance window I get the benefits from the NVME protocol. So really a good story ofmoving customers um even though they might not be you know um uh bleeding edge andat the NVME protocol adoption today. So question uh when will NVME over TCP or FC be supported in snap center and snap center for vSphere? That's one question that I unfortunately cannot answer. Uh either u and or Patrick if you know please type the answer. Um otherwise we'll take it with us and we will deliver the answer uh after the call together with the materials. Thank you. Also the ASA comes with our onep licensing model. So it onep has been introduced for the unified systems some time back and we also decided to get on one on the ASA as well. So it essentially includes everything that onep also includes for unified except the NAS protocols because that's the whole idea of theold San array. So this includes all the aforementioned block protocols old and new data protection including snap mirror active sync or formerly known as snap mirror business continuence. Um and you also get features that are not necessarily really related to a son environment and that is snap lock and snap mirror cloud. So that is on tap value that we can also deliver for these sand environments dedicated sand environments. So especially around snap lock. So how does snap lock make sense in a sand only environment?because you may know snap lock essentially from thefeature or the idea there where it was first um first developed for is uh preventing premature deletion of files. So having a worm capability write once read many from an archive perspective so that I can have an expiration date on a file and it cannot be deleted before that. Obviously, this use case doesn't make too much sense in a sound environment. So, um it's not really applicable to LUNS. So, why do we still need snap lock in an ASA array? Because there are these two other use cases where they uh really make a perfect addition especially in regards for data protection and uh protectinguh for deletion or altering of data due to ransomware attacks. So the second topic here that is essentially you know hopefully our snap wall product where you take a snapshot from your primary system you replicate it to your secondary where you hold a longer set of snapshots maybe a week maybe a month maybe longer and we can protect these snapshots by snap lock so that they cannot be deleted. So this really is protection of backups. So the term that we usually use in that uh combination is lock vault. So locking the snapshots in a snap vault destination.And this is the basis for our ransomware recovery guarantee. So if you use this feature, if you take your snapshots off your primary and you lock your snapshots so that they cannot be deleted, not even by an storage admin, we can guarantee that there is a recovery point that you can get your data back with. So that is one and uh also the third one is in combination with our snapshot. So the snap lock or using snap lock to lock down the snapshots on the primary or maybe even replicated snapshots on the secondary selectively um depending on how many snapshots I want to take and um how important I think these snapshots are. Again these snapshots cannot be deleted. So um people need to be careful on what they set the retention to. Um and this is thereason why snap lock also exists in this ASA environment. So there's an interesting question. The first one that I just see is uh coming back to the SCSI to NVME uh transition. So how much time is required to convert Scuzzilon to NVME namespace? there is not a single bite that is copied or moved. So if you convert a L to a namespace, we are only changing metadata. So that is uh an instantaneous change. Uh obviously you have to unmap the L from a server in order to do so.from an application perspective, it is disruptive. But to disconnect the L, convert it and reconnect it back to the server is something that can be achieved in a matter of minutes. Second question, are there any limitations of snap mirror from ASA to a fast system snap vault? Uh, no they are not um just using it as a snap vault destination that's fine. I mean the only thing that the ASA really differs is the fact that it doesn't have any NAS protocols. So you can use a FA or AFF platform as a snap mirror destination and use that in combination with uh snap mirror for DR or snap vault for uh backup to disk. following a couple of slides uh having a little bit more details about the guarantees uh I don't want to go in into that mentioning it's just for reference so if you want to have a look at these please do sowe are claiming that we have 69's availability for our all sanor HA pairs they are measured by our auto support data so we really think that we have a basis for this claim the second guarantee is the uh ransomware recovery guarantee I also uh mentioned that already and how we implement it based on the lock vault feature and then the storage efficiency guarantee where we say uh we are as efficient and give this guarantee of a 4 to1 data reduction on an ASA system in son environments to all of these guarantees tens applies customer may or may not choose to sign those um yeah it'sa marketing vehicle uh and that is actually really something to get into the conversation with customers because if we are willing to back up our claims with the guarantee I think that goes a long way in the end it's the technic uh or the technical details that have to u be good and have to convince the customer u but this is a good starting pointmoving a little bit to the technical side um especially with these efficienciesUm we are seeing the usual claims again marketing claims of 4:1 efficiency. Some competitors say 5:1 comp uh efficiencies and there is always this applesto apples comparison problem because uh many efficiency guarantees or actually all the efficiency guarantees even ourown one is based on this ratio of usable to effective capacity. The question is how much raw capacity do you need in order to get to a certain amount of usable capacity and that is where we with onap actually are pretty good. So if we take this example and say there is a 100 terabyte of raw SSD capacity with ADP or rate overhead building the aggregates then we can get 82 terabyte of usable capacity out of that. Um I've had just this morning uh we had an internal discussion in our um sales engineering group where uh someone really sized uh an a I think it was a C800 in uh infusion and exactly got to this 80 uh 82% usable capacity out of that raw. So thisis a number that we can really stand behind. And if we take from that usable then our 4 to1 efficiency guarantee then that gets up to uh 320 terabyte actually a little bit more but let'sjust round it tothe last number. So 320 terabyte of effective capacity. If we compare that to a competitive system with a similar raw capacity but a completely different kind of overhead just because of the way uh the data is protected um you know everybody does things slightly different that competitor ends up with just 67 terabytes of usable capacity. So while we are let's say losing 18% of capacity they are losing roughly onethird of capacity just in order to perform uh the underlaying and the basic data management and if you then add the capacity you get to roughly 268 terabyte and even applying a 5:1 efficiency brings us just close maybe a little bit over the 320 terabyte that we can guarantee with our toward one efficiency number. So, always try to compare apples to apples. Always ask what the marketing numbers are based on, even with ours. I mean, I'm also a technical guy and I usually question marketing claims. And I just wanted to feed you um some information on what is behind these uh these claims. And that is really important on comparing that to the raw number rather than the usable. Last but not least, before we go to positioning, um, ONTP is what makes it valuable to the customer. If it's really just about dumb lungs and they really just need block storage, why not take an E serious? Um well question but you know ONTP is thething even in an ASA that thesorry that delivers value to uh the customers in the end it's all about the data management functionalities that we have the cloud integration the replication technologies and everything that we can offer from a software perspective ASA is just one hardware vehicle to get these value ads to the customer. So if there is a possibility, try to include these other topics that benefit workloads that benefit applications or that benefit application owners so that their life gets easier and not just because they need a 100 terbte. Um there is much more that we can do. So there is a lot of customers there is a huge number of customers that already trust us with their son workloads. So it's not that with ASA we are completely new in that SA market but with ASA we would like to expand our footprint in that son market. So we have lots of customers who are using the unified storage system in block only environments because that's just what we had atthose times. So ONAB delivered value to those customers. They decided to go with it and they chose the AFF the unified storage system. And we also have customers that run block storage onthe unified systems. Others have both running. Let's say they have one system um or one environment that is an AFF or fast and they only run NAS protocols and they run a different set of AFF or fast even though it's unified and they're running NAS only. So there is this set of customers that do unified on a box others that do unified and separate them in environments and then we have these customers that really are strict SAN only and these are the customers that really we really want to engage with. So the question is why do we need the ASA? It's to get roadblocks out of the way. you know, having easier conversations with customers, not having to explain certain things andwhat it is about theunified storage all flash andon tap when all that the customer asked about in the beginning was I want to sign an array and some customers simply feel that having something else on the same array like NAS protocols makes it a lesser array. because they think in that case it's not optimized for the use case that I want to use it for and I have all these other clutter you know hanging around that I don't need and that I had paid and it just adds complexity and cost and that is where we just say okay take an ASA take an all array and what you get is a dedicated SA array so it's really meant and only uh built for providing block protocols. There's also this other argument onthis slide and I think that is also something worth mentioning especially in the beginning uh when we entered the sand market uh there was thisfud about our LUNs where it was said NetApp Ls are not real LUNs they are just files in a file system and while that might be true um the answer really is why would it matter especially these days Um we are in a world where we are talking about thin provisioning storage efficiencies duplication compression and all that. So we are in a world where we are far away from one logical block in a LUN is a onetoone mapping to a physical block in a rate group. So in the end a L is just a number of logical blocks accessible by a protocol and we use waffle as an abstraction layer to get this pool of blocks presented and make our magic around it and all the others do similar things. So again with SIM provisioning with efficiency uh features in the market everybody has to have some kind of abstraction layer in between the physical rate groups and the land that is finally being presented out to the hosts. So if there is any of these argumentsum be assured everybody else does it too. They're just calling it differently. So while others ours is still waffle um and the F stands for file um it might be true but in the end it's simply an abstraction layer. So then why do we need Aza? Second topic some customers have no interest in file protocols. Uh maybe it's because they have separate teams. One is responsible for NAS the other for sand. they do s only and that team is the run issuing uh issuing the RFP and that is the one that we need to answer. So again we don't do file on the ASA platform and as such you know get these discussions out of the way just go in there ASA is a dedicated SAN array no five protocols not even an option to enable them uh no way around so it's a SAN only array then there is something that goes away from perception more to really you technical specifications, some behavior that people expect and one of these behaviors is the activeaccess to a LON across both of the controllers in an HA pair. [snorts] We can provide it with ASA we do exactly for that reason that people ask for that. Um it's considered table stakes I think inson environments. There's nothing wrong with all with a lua and I'll get into that in the passing discussion also but um customers sometimes just want to have activeson multipaththing either because they are used to it or maybe because a competitor just got it into the RFP because they are capable of doing it and for a long time they knew how to keep us out of RFPs because for a long time we weren't able and with ASA we can do it. So ASA offers the activefiber channel and iscazisan multia. So again taking out a roadblock out of the discussion and being confident in providing the right solution to the customer. Then there is customers who actually need an activesound multipassing because what does that give me over the alua based multipassing of an AFF and that is faster failover times. If there is a path always available to my host, this host has a means to send IO down a path at every given point in time. It doesn't need to wait until the storage system demotes or promotes certain uh path in order to have the server use it. So it it's the essential part of getting down failover times if there is maintenance let's say an onep update or if there is actual failures like a controller failure path failure so there is sometimes there is really a need for applications that need these quick failover times or customer SLAs's based on applications and then they need to have these failover times as well and there again PSA is the vehicle to deliver these faster failover times. So if we take that as a summary for our positioning generally it's all about ontap SAN personally I see that the discussion versus ifAFF versus ASA isan afterthought first of all it's all about ONAP and the data management capabilities the value ad that the operating system the data management layer can offer to the customer then data later down the road then there might be discussion is an ASA the right platform for you or is an AFF the right platform for you but always keep in mind ASA is not intended to be positioned against AFF what the ASA is it's an additional arrow in your quiver so that you can attack a customer install base a customer sound environment it's not about you know you have your comfortable customer base and in the past you simply sold them AFF systems and with the tech refresh you go in and now you sell them ASA. That is not the idea. What we want you is to take go out and attack these new customers and grow market share. And ASA is just one tool in your toolbox that you can use to object some of these perceptions. be they valid or not but ASA is just one other tool one other platform that we can use to get new customers grow the install base with ONAP. So then there is these customers that really dislike anything but SAN protocols then okay ASA is the obvious choice uh and those customers really care only about SAN especially with RFPs where you really have to answer yes or no if you have a certain feature and activeis in there we can now participate in that RFP if in the end you sell an ASA or an AFF be it unified or block only doesn't really matter But it gets you into the door or yeah gets you the foot into the door. So that is theIthink the important part about the ASA and those that need the um activemultipaththing then we can offer them a solution too. In the end you can also or the customer can save money on moving to an ASA. If this really is about block storage only then the ASA might be commercially attractive as it is cheaper than the unifi one. So some good reasons and in the past there have been customers there have been account teams and there have been partners who successfully positioned an ASA and this was one of the main drivers and that's a fair assumption. So that's fine as well. Okay, let's go into some technical differences because there are ma two main differences between an all flash or unified system u so all flash fast uh or the ASA first is this passing the activeversus the alua based passing so in the unified system for you know as long as we do sand We talk we tell theclient we tell the host which path toprimarily use because that is the so-called active optimized path in alua toms. Alua stands for asymmetric lung axis. So one of our controllers owns the lung. I think you are very well aware of how our architecture works. So this uh node presents the active optimized pass and the HA partner is just presenting the active unoptimized path. The server is not intended to use that path until something happens. So it will always use the optimized path. And if a path fail, so in this case really a path fails and it's not the node itself, then we will switch uh thepath. We will promote the active unoptimized pass to active optimized and explicitly tell the server please now use this other pass. There is nothing wrong with this behavior. Allure is an industry standard.It's just that it's time consuming because we have to react and tell the server. the server has to react on what we just told them and then only afterwards that server is able to issue another IO down that other path. So usually that takes roughly between two and maybe up to 10 seconds that the client then um incurs this IO pause and is not able to drive down any IO down any path and that is where the ASA really differs most. So first of all both nodes of the AJ pair present an active optimized part uh path irrespective of if that node actually owns the LAN or not and then usually the question comes but isn't that path slower? The answer is technically yes. But then it goes back to does that matter? Because if we compare that to our competition, we are only talking about average latency and we compare average latency from our system against average latency of the competitive system. And there we are really well on. So we are as good or in sometimes even better than what our competition can deliver in regards to latency. Where it might get ugly is when you compare an ASA against an AFF because the AFF because of its architecture and uh using allure always forces the client to take the optimized pass. So with a certain workload and if you configure it and really if you measure it you will see that an AFF indeed is a little bit faster. So has a little bit lower latency than an ASA. Then again the question is does it really matter? Because ASA is not meant to be position against an AFF. The ASA is meant to be positioned against a competitive array. So the question hopefully shouldn't come up. Maybe it's an existing NetApp customer and he doesn't know exactly as much about RHA a pair as you do. So maybe that question comes up. But it shouldn't. And then again the question is does it really matter? And if you compare it to our competition and I think the most prominent one being pure at this point in time they have a similar architecture with a HA pairs but only one of the controllers actually owns the resources the other one issimply just there for an HA takeover to occur. So even there they presenting symmetric activepath but the truth is the IO that's going to through the AJ partner has some additional latency too. So again we are comparing apples to apples. There's nothing bad about this design. We just need to make sure that we do the right comparisons. So what where does it bring us in regards to failures? So when a server has active optimized paths all the time to both of the controllers and we lose one of those path the server is free to just take the next IO and send it down the other path that is still available. So essentially there is no IO pause from a server perspective other than the pause that it takes us in the storage system to maybe move that L over from one node to the other if the node has failed. In these kind of scenarios we wouldn't even see any IO pause at all because the node is still there. We don't need to have a storage failover. It's just that a path fails. So this makes the failover in itself much quicker. So usually uh lower than 5 seconds more in the 2 to 3 second range from a fadeover perspective. So what does it really mean? Because again there's nothing wrong about an AFF and a lure based multipaththing. Most discussions are around perception. you know allure is slow or um you know it activeis more resilient there is not real technical truth behind it um but I would just like not to get into discussions with the customer um I'm happy to discuss you know the technical things like I do with you right now but if the choice is that he would like to have activeI wouldn't argue I just say okay then the ASA is what you need and that is what I can get you. So what activethere is still this nuance of um you know IO's that still take some time. I mentioned if we have this activesetup and if a path in this case to node one would fail the server can immediately switch to the other path that's going through node two. So theoretically youwould say there is no outage. It's an what we obtain times see an instantaneous failover.To be blunt there is no such thing as an instantaneous failover for SAM. It's not possible just because there are components involved that have to react to certain situations and that is the complete IO stack. There is an example um if you take a database for example a database writing to a log file the writes to the log file they have to be in sequence they must not get out of order or otherwise you have a corrupt database. So if you are writing into the LO file and while that write is happening you lose a path and that write is not committed then the server will not be able to send any subsequent IO's that would alter that log file down the other path even though it still exists. So that server that application has to wait for this single IO to time out because it will fail. there is no path to that um to that file and it cannot be updated. So the error will get back down up the storage stack and somewhere in the scuzzy layer um then it will take okay do a retry and then do that retry down the other path and only then the IO will complete. For these specific IO's which are called dependent IO's which are not uncommon you will see the exact same IO pause for an activesystem that you see for an Aurora based system and that is you know I'm actually saying system and not ASA versus AFF because that is common amongst any vendor they just can't change how the storage stack works. It is though that I can let's say fake ademo. I can fake an IO profile where I just don't care about dependencies of rights and I just say okay I write 100% um in a 4k block size and I write down to this lung and then I fail a path I wouldn't see any IO disruption at all because these IO's are not dependent I will just continue with the other IO's and set it down the other path but there is at least one single IO that is still stuck in the IO queue but that I won't ever see inmy observations because eventually it will get retrieded after 30 seconds and it will get completed but from an outside view no disruption happened but those are not real IO profiles but that is sometimes what we see inPC's or so being shown talking about multipaththing we have to be cautious because there is one exception for activein the ASA and That is for the NVMe protocol. NVMe multipaththing is always active optimized over the controller owning the name space and active nonoptimized or the HA partner. So similar to alua in scuzzi so asymmetric L access for scuzzi that is called ANA or ANA in NVME terms. So asymmetric namespace access. The good thing is we are not the only ones that cannot do activefor NVMe. Uh I just had a uh a PSE, a professional services guy this morning talked to me about us versus competition. He was able to compare that at a customer with an NVME protocol toother solutions. So that is not something that we are just lacking. Um that is something that is still in development for that protocol. The NVME protocol is still relatively young. It has to evolve and uh we are looking into that uh and trying to get activemultipassing into NVMe as well. So before I go to the next topic I see one more question here. So in case of active optimized nonoptimized path switching rely on NPIO software which runs in servers OS and default value is usually 30 seconds. So it's not up to 10 seconds but quite long 30 seconds that is possible yes but that would apply to any vendor in that case right um usually a takeover in our system um occurs much quicker and then again you may have to have a look on what is the um the IO timeout setting and what is the protocol involved. Uh you mentioned 30 seconds in your question. Uh 30 seconds is some default that I usually see especially for icegazi environments. Um for fiber channel I have seen people uh tweaking that number away from the default. um if that 30 second even is default but you're right if thesetting for IO retries or for switching path in the multia policy is set to 30 seconds then that timeout is dictated by that policy obviouslyum what I just want to bring over is we can be faster even with allure if anything longer than that even more time umflies then it it's not our fault then it's a setting that can be optimized in the server IO stack. So that is the activepassing. There is one other feature that the ASA is different than an all flash fast. uh when we introduced sand especially with clustered ontap so back in the days with onap 8.3 where we had unified on clustered ontap we introduced this concept of the lift the logical interface which means the IP address or the WWPN put on a physical port and usually with a lift migrate we can move these lifts around. We made one designdecision at that point in time and that was we don't move son lifts and that is because we were just relying on the multipassing software because that was a given. We simply were expecting that any son client will always have more than one path down to a L and if that path fails the multipassing software will take care of moving the IO down the other path. But as we just learned this switch takes time. So now for the ASA we have taken measures to reduce this amount of time it takes for the server to reissue the IO. And one of that is in fiber channel so-called persistent ports. If you're familiar with metrocluster then this concept looks familiar to you. um because we're essentially doing the same thing but now within an HA pair. So what it does is if we have a controller let's say node one and we are configuring a logical interface with a WWPN on that node then ONP will create a shadow lift with that exact same WWPN on the HA partner but that live will be offline it will be dormant. You cannot change it, you cannot alter it, you cannot online it as long as the initial um original lift is still there and is still online. So it's really it's a copy and it's offline. Similar to node two, we have that lift 2 different vwpn. We create that shadow lift back on node one dormant not accessible. So what is now the advantage from a host perspective when we use these things? If we have a controller failure and we do a storage failover, we take the LN online. What we also do is we immediately take that dormant shadow lift online. So for a cluster uh sorry not for a cluster for a client that simply looks as if the logical [snorts] interface thescuzzy target that he was just talking to simply moved from one physical switch port to another one. Sowe don't have to run up the IO stack the same way as if the um the path would have failed um indefinitely. So then theIO timer would have need to run up the scuzzy stack or the storage stack into the scuzzy layer where the scuzzy layer needs then to do the retry. Whereas if we do this one then the by onlineing theport we get the fabric lock in all the connected servers are notified that there is a change in fabric and that they need to check and reinitiate IO. So now we are reinitiating the O some way lower down the stack more in the fiber chain and HBA part and we don't need to escalate that up the scuzzy stack in order for the host to retry. So that again helps to reduce the IO resumption time. So the IO pass that is perceived by theclient for a specific IO on that broken path. We do something similar and yet different for ice. Um I mentioned earlier we made a decision so that interfaces logical interfaces won't move and now we made this exception for ASA where we say icegazi lifts do move. So we do a regular lift failover in case one port's failing, one cable is failing, that port goes offline, we can move that lift to a different physical port and that can either be on the same H uh the same node or what is much more prominent in an HA pair is move it over to the HA partner in case one node fails. So my server again these two equal paths to the LN one across node one the other one across node two and if I do my storage failover we do a live migrate and we take that live along so the client can again still talk to the same IP address and in that case we don't need to escalate that IO error up into this scuzzy stack but that is where TCP will already recover and be able to re reissue the IO on the TCP layer which again makes it quicker from a client perspective from the IO resumption. And with that top of the hour we are done. Are there any more questions? I know you can't ask verbally but if there is any I think we can still wait another one or two minutes. So if you type quickly uh I'm happy to answer question that arise in the Q&A section. So there is one question what is the PC usual failover time in case of using persistent port FC and TCP or uh I mentioned it briefly earlier Ithink we are uh 3 seconds and below from an IO resumption point of view. Um I haven't measured it personally. I have to rely on the numbers that we have uh documented around ASA fiber channel may be quicker though compared to icecuzi um but that is not the fault of ice scuzzi or the scuzzi protocol in general this is more related to TCP so TCP has thisuh backoff behavior so if it doesn't get a reply after 1 second it waits 2 seconds if it's not after 2 seconds it waits 4 seconds if it's Not after four, it waits 8 seconds, then 16, then 30, and then again 30So if you don't meet that window in let's say your 2 to 3 seconds, it may be that from a TCP perspective, you will see an IO pause of 8 or 10 seconds just because of the behavior of TCP although the storage array underneath. So we have already completed our takeover successfully. So uh is shadowliff and Icazi lift migration exclusive to ASA? Yes, it is. Um at this point in time, I don't know about any plans of porting this to an AFF and um again as this is another means to reduce failover times withthe activepassing is primarily targeted for ASA. I currently also don't necessarily see a need but you know customers do strange things. So if you see a need for such thing um if you have customers asking for that please reach out to the Netup account team um covering their customers together with you. Have them file a future enhancement PVR. Describe the use case and um future PVRs can move mountains. Um, we just need to have them. We need to make it visible. >> Okay. I think uh we're uh where aunt is just typing one more answer, but I think uh we need to be respectful of everyone's time. I think Neils, you could possibly keep answering questions all afternoon. Uh thank you everybody for attending the call. Uh please do fill out the survey on the way out. Let us know what you thought of the content. Let's know what you thought of the format and any other topics you want to see in the future. Uh, with that, thanks Neils. Thanks uh to uh Patrick and Aunt. And uh we'll see you all next time. Thanks everybody. Bye-bye. >> See you. [snorts]
Learn about NetApp All SAN Array (ASA) block storage and unified storage messaging, including SnapLock use cases, NetApp's Ransomware Recovery Guarantee, and six nines data availability.