March 11, 2025
We are excited to announce a new capability in BlueXP workload factory to provide insights from image files and images embedded within PDF and word documents in addition to insights derived from text included in the source documents. Customers can now get more relevant and deeper insights to their queries from the information captured in images or graphs that are part of the source data sets.
The latest update to Workload Factory for GenAI introduces image processing capability that enables organizations to ingest image data, in addition to text, into their RAG pipelines. When creating your knowledge base, you now have the option to choose a multi-modal language model which is used to process image files in the source NFS and SMB shares and images embedded within PDF and word documents. Workload Factory for GenAI will extract and send images to the language model to get a description of each image and will embed the description along with text embeddings. User query responses from the knowledge base now include insights from images and graph descriptions, as well as document text, leading to richer and higher quality answers.
The image processing feature is enabled as part of the knowledge base creation workflow in Workload Factory. To enable the feature, configure a supported multi-modal language model for chat during the knowledge base creation process and optionally configure any file filters to include specific image types. Supported image formats include.jpg, .jpeg, .png, .gif and .webp. Supported multi-modal models include Amazon Nova and Claude models Maximum supported size is 3.75 MB with up to 8000 x 8000 pixels resolution per image. Images embedded in PDF, .doc and .docx will be automatically processed.
Read the blog to learn more or or visit the Workload Factory product page, documentation. To get started, sign up for Workload Factory.