AI Model Compressor
Have your AI model compressed and benefit from efficient and portable models. Greatly reducing the requirements for memory and diskspace, making AI projects much more affordable to implement.
Lower your energy bills and reduce hardware expenses.
Keep your data safe with localized AI models that don't rely on cloud-based systems.
Overcome hardware limitations and accelerate your AI-driven projects.
Contribute to a greener planet by cutting down on energy consumption.
Current AI models face significant inefficiencies, with parameter counts growing exponentially but accuracy only improving linearly.
This imbalance leads to:
The computational resources required are growing at an unsustainable rate.
Increased energy consumption not only impacts the bottom line but also raises environmental concerns.
The scarcity of advanced chips limits innovation and business growth.
Revolutionizing AI Efficiency and Portability: CompactifAI leverages advanced tensor
networks to compress foundational AI models, including large language models (LLMs).
This innovative approach offers several key benefits:
You can now access CompactifAI models in three ways:
1. Via API on AWS – Our compressed and original models are available through our API, now listed on the AWS Marketplace.
2. License for private infrastructure – We provide enterprise licenses to deploy CompactifAI on your own on-premise or cloud environment.
3. Delivery through a service provider – We can compress your model and deliver it to your preferred inference provider or infrastructure partner.
CompactifAI is compatible with commercial and open-source models like Llama 4 Scout, Llama 3.3 70B, DeepSeek R1, Mistral Small 3.1, Microsoft Phi 4, among others. It needs to have access to the model itself to be able to compress it.
OpenAI provides an API to access (query) the model, therefore Multiverse Computing’s product is not able to compress it.
One of the advantages of CompactifAI is that the compressed model can run anywhere - it can run on x86 servers on premise if security or governance reasons are a concern, but it can also run on the Cloud, our laptop or any device. You choose.
One of the advantages of CompactifAI is that it reduces the resources needed to run RAG and greatly speeds up the inference time.
Minimum requirements to run the models stated below. These are not necessarily the requirements needed to make this work on real application. In particular, at inference time the requirements will vary depending on the required latency (response time) and throughput (tokens per second) for the system. The latter is related to the number of simultaneous users you can serve. Consider these requirements as a lower bound; improving latency and throughput would require more powerful GPUs, such as NVIDIA H100 GPUs with 40GB or 80GB of VRAM . [source 1, source 2, source 3]
Training, LLM of 7b at FP16:
GPU: 8 NVIDIA A100 GPUs each with 40GB of VRAM
RAM system: 320 GB
Disk space: 40 GB
Training, LLM of 70b at FP16:
GPU: 32 NVIDIA A100 GPUs each with 40GB of VRAM
RAM system: 1280 GB
Disk space: 200 GB
Inference, LLM of 7b at FP16:
GPU: 1 x NVIDIA A10 GPUs with 24GB of VRAM (or higher models)
RAM system: 16GB
Disk space: 16GB
Inference, LLM of 70b at FP16:
GPU: 8 x NVIDIA A10 GPUs with 24GB of VRAM (or higher models)
RAM system: 64GB
Disk space: 140GB
Customers can retrain the model if they have the platform and resources to do it. Multiverse Computing can also provide this service at a cost to the customer.
No. It is not open source. We do not currently share CompactifAI on GitHub.
Yes. We developed it to compress any linear and convolutional layer used in standard LLMs. If there is a model with a custom layer, we can quickly adopt it in CompactifAI.
It is in our roadmap. We are developing the next version of the compressor which supports multi-modal models.
Contact us today to learn how CompactifAI can streamline your AI operations and drive your business forward.
Unlocking the Quantum AI
Software Revolution.
Interested in seeing our Quantum AI softwares in action? Contact us.