
Now loading...
Artificial intelligence advancements on personal computers reached a significant milestone in 2025, with small language models designed for PCs achieving nearly double the accuracy compared to the previous year. This progress has substantially narrowed the performance divide between these local models and more powerful cloud-based large language models. Supporting tools such as Ollama, ComfyUI, llama.cpp and Unsloth have evolved considerably, seeing their user base expand by twofold annually while downloads of PC-optimized models surged ten times higher than in 2024. Such innovations are setting the stage for broader integration of generative AI into daily activities for content creators, gamers and office workers alike.
During the ongoing Consumer Electronics Show, NVIDIA unveiled a series of enhancements for its GeForce RTX graphics cards, RTX PRO workstations and DGX Spark systems. These updates aim to boost the speed and efficiency required for running sophisticated generative AI applications directly on PCs. Key improvements include up to triple the processing speed and a 60 percent drop in video memory usage for creating videos and images, achieved through PyTorch-CUDA tweaks and support for NVFP4 and FP8 precision levels in ComfyUI. Integration of RTX Video Super Resolution into ComfyUI will speed up the production of 4K videos, while NVFP8 optimizations support the open-source LTX-2 audio-video model from Lightricks. Additionally, a fresh pipeline for 4K video generation leverages 3D scenes in Blender for precise results, and small language models now run up to 35 percent quicker using Ollama and llama.cpp. RTX hardware also accelerates Nexa.ai’s Hyperlink feature for searching video content.
These features enable smooth execution of complex AI tasks in video, imaging and language processing, all while maintaining user privacy, enhanced security and minimal delays through on-device computation on RTX-equipped PCs.
A standout announcement involves a new RTX-enabled workflow that triples the speed of video creation and supports 4K output, using far less graphics memory. Unlike prompt-based online services that often lack precision, this method lets artists start with storyboards, produce realistic key images and animate them into polished videos. The process breaks down into interchangeable components: one for generating 3D scene elements, another for creating lifelike images guided by Blender setups, and a final stage that animates between frames before upscaling to 4K via NVIDIA’s RTX Video tools.
Central to this is Lightricks’ newly released LTX-2 model, which rivals top cloud services in quality while producing up to 20 seconds of 4K footage complete with sound, multi-frame guidance and refined control mechanisms. Powered by ComfyUI, the system benefits from recent NVIDIA collaborations that cut processing time by 40 percent on its GPUs. With the latest NVFP4 support on RTX 50 Series hardware, overall speed increases threefold and memory needs fall by 60 percent; NVFP8 yields double the speed and 40 percent less memory. Optimized versions of models like LTX-2, FLUX.1 and FLUX.2 from Black Forest Labs, plus Alibaba’s Qwen-Image and Z-Image, are now accessible within ComfyUI, with more to follow.
Post-generation, the RTX Video node in ComfyUI upscales clips to 4K in moments, enhancing sharpness and removing distortions in real time; this tool arrives next month. To handle memory constraints, an upgraded weight streaming option in ComfyUI taps into system RAM when GPU limits are hit, supporting bigger models and intricate setups on standard RTX cards. The full video workflow downloads next month, with LTX-2 weights and ComfyUI RTX features ready immediately.
In file management, traditional PC searches based on names and basic tags often fall short. Nexa.ai’s Hyperlink changes that by turning RTX systems into intelligent, local databases responsive to everyday questions, complete with source references. It indexes documents, presentations, PDFs and pictures for content-driven queries, keeping everything on-device for confidentiality. RTX acceleration slashes indexing to 30 seconds per gigabyte for text and visuals, with responses in three seconds on an RTX 5090, versus an hour-long indexing and 90-second replies on standard processors.
The CES reveal includes a beta upgrade for video handling, allowing queries on elements, movements and dialogue within footage. This benefits filmmakers scouting clips or gamers reliving highlights. Interested users can register for the private beta at Nexa.ai’s dedicated page, with access beginning this month.
For small language models, NVIDIA’s partnerships with open-source developers have boosted runtime efficiency by 35 percent using llama.cpp and 30 percent with Ollama over recent months. These gains particularly aid mixture-of-experts architectures, such as the NVIDIA Nemotron 3 open models. Faster model loading in llama.cpp adds convenience, with integrations planned for LM Studio and apps like MSI’s AI Robot for device control.
NVIDIA Broadcast version 2.1 enhances audio and video for streaming and calls with AI filters, expanding its Virtual Key Light to RTX 3060 desktops and above. Improvements cover varied lighting, wider color adjustments and a professional dual-light setup via updated HDR maps. The update is available for download now.
For home studios, the compact DGX Spark AI supercomputer complements desktops or laptops for intensive tasks like model testing or asset creation without interrupting primary workflows. CES updates deliver up to 2.6 times the performance since its debut three months ago. New guides cover techniques like speculative decoding and dual-unit fine-tuning.
