On Thursday, IBM Corp. unveiled Granite 4, its latest series of open-source language models that integrate features from two distinct neural network architectures. The initial launch comprises four models, with sizes ranging from 3 billion to 32 billion parameters. IBM asserts that these new models demonstrate superior performance compared to similarly sized alternatives while using less memory.
The Granite-4.0-Micro model, the smallest in the lineup, is built on the widely-used Transformer architecture, known for its attention mechanism. This feature enables the model to analyze a given text snippet, pinpoint key sentences, and prioritize them in its processing approach.
The other three models within Granite 4 enhance the attention mechanism with components derived from the Mamba architecture, an alternative to Transformers. Mamba’s key advantage lies in its hardware efficiency. Instead of employing an attention mechanism, Mamba utilizes a mathematical construct known as a state space model, originally designed for aerospace trajectory calculations.
Unlike Transformer models, which see their memory requirements quadruple when the length of a prompt doubles, Mamba exhibits far lower memory consumption. This characteristic translates to reduced inference costs. The Granite 4 models are underpinned by the newest Mamba-2 release, which simplifies core operations into approximately 25 lines of code, resulting in less hardware demand compared to its predecessor.
The flagship model, Granite-4.0-H-Small, boasts 32 billion parameters and features a mixture-of-experts architecture capable of activating segments of 9 billion parameters when responding to queries. IBM anticipates developers will leverage this model for customer support and similar applications.
The two additional models, Granite-4.0-H-Tiny and Granite-4.0-H-Micro, come in at 7 billion and 3 billion parameters respectively, emphasizing rapid response times over processing accuracy for latency-sensitive scenarios.
In a recent internal benchmarking comparison, IBM found that the Granite-4.0-H-Tiny model required only 15 gigabytes of RAM—one-sixth the amount needed by the previous Granite 3.3 8B model—while also yielding higher output quality. IBM emphasized that the efficiency gains of the new Granite architecture significantly contribute to model training effectiveness, while enhancements in training methods and data refinement drive improvements in overall accuracy.
Granite 4 is now accessible through IBM’s watsonx.ai platform and over six third-party services, including Hugging Face. Future plans include making the models available on Amazon SageMaker JumpStart and Microsoft Azure AI, along with introducing additional algorithms aimed at enhancing reasoning capabilities.
For further details, readers can refer to IBM’s official announcement on Granite 4.