Unstructured data can be thought of as data that’s not actively managed in a transactional system; for example, data that doesn’t live in a relational database management system (RDBMS). Structured data can be thought of as records (or transactions) in a database environment; for example, rows in a table of a SQL database.
There is no preference as to whether data is structured or unstructured. Both have tools that allow users to access information. Unstructured data just happens to be in greater abundance than structured data is.
Examples of unstructured data are:
Until the advent of object-based storage, most, if not all, of this unstructured data was stored in file-based systems.
The way to think about how to deal with the challenges of unstructured data is to ask: What do enterprises face with traditional approaches to managing unstructured data?
ScaleIt’s common in many enterprises to encounter unstructured datasets at the scale of tens or hundreds of billions of items. These items, objects, or files can be anything from a few bytes (for example, a temperature reading from a production-line instrument) to terabytes in size (for example, a full-length 8K resolution motion picture). Managing this scale with traditional file approaches rapidly moves from difficult to impossible as more and more resources are required just to maintain a “balance” of servers, file systems, arrays, and so on.
CollaborationIncreasingly, these massive unstructured datasets deliver value as they are shared (for example, researchers at multiple hospitals who share a common massive bank of genomic sequences). With traditional approaches, the ability to share massive sets of unstructured data across geographies, corporate entities, and so on, has required extremely expensive replication and governance.
Today’s object storage solutions meet the challenges of scale and collaboration by delivering a geo-distributed active namespace. This namespace enables a user at any location to retrieve an object or a file from any location with a simple GET command (without having to specify a data center, server, file system, or director). Similarly, PUT commands enable the ingest of data so that all locations can easily have access.
The simplicity and scalability of a single global namespace combined with a simple stateless data management protocol (for example, Amazon S3 and Swift) help organizations deliver a scalable and collaborative environment across geography, organization, and application boundaries.
You can store and manage unstructured data at scale by using NetApp® StorageGRID® technology for secure, durable object storage for private and public clouds. With StorageGRID, you can build a massive (multilocation) single namespace, and you can also integrate a unique information lifecycle policy into that data. With the StorageGRID integrated policy engine, you can be confident that your data is available: