Turing Inc.

DDN EXAScaler Accelerates the Development of AI-Driven Fully Autonomous Driving Systems

Turing Inc. (“Turing”) is a startup founded in August 2021 that develops AI-based fully autonomous driving systems. Conventional rule-based autonomous driving technologies have limitations, such as only being able to operate autonomously in specific areas.

In contrast, Turing is developing an end-to-end autonomous driving system in which a single AI handles everything from recognition to decision-making and vehicle control based on camera footage. This approach requires large-scale AI training, but enables more flexible decision-making, with the goal of realizing fully autonomous driving.

To support this effort, Turing has built its own dedicated AI computing infrastructure, “Gaggle Cluster.” For high-speed data storage in Gaggle Cluster, Turing has adopted DDN EXAScaler. We spoke with Mr. Kohei Watanabe, Senior Infrastructure Engineer at Turing, about the objectives behind the deployment and its results.

Building Gaggle Cluster, a Computing Infrastructure Specialized for the Company’s Needs

Turing is advancing the development of Japan’s leading-edge fully autonomous driving technology. In December 2025, the company achieved its “Tokyo30” milestone, completing 30 minutes of uninterrupted driving in Tokyo without human intervention. In March 2026, it also announced the realization of real-time autonomous driving control on public roads using an AI model called VLA, or Vision-Language-Action, which adds driving-operation output to video and language understanding powered by VLM.

At the core of Turing’s autonomous driving system development is the creation of its proprietary AI models. To support this, the company developed its own AI-dedicated computing infrastructure, “Gaggle Cluster,” which began operation in 2024 and is equipped with 96 NVIDIA H100 GPUs.

Components of Gaggle Cluster
Components of Gaggle Cluster

A key reason Turing needed a dedicated computing infrastructure is the scaling law of AI training. In the training of LLMs and similar models, it is known that model performance improves in proportion to data size, parameter count, and computing power.

Turing chose to build this infrastructure on-premises rather than in the public cloud because it wanted access to the latest technologies.

“Cloud providers’ business model is to build infrastructure and recover the cost over several years. That does not necessarily mean we can use the latest technologies in a timely manner. We chose an on-premises approach in order to secure our own computing resources at an early stage,” said Watanabe.

Turing began considering Gaggle Cluster around March 2024. Watanabe joined the company at that time and became involved in both the planning and execution of the project. The products to be deployed were decided and ordered in May of the same year, and the system was built in a short period of time, beginning operation in October.

This rapid deployment was made possible in large part through collaboration with NTT PC Communications Inc. (“NTTPC”), which offers a “GPU Private Cloud” service.

“At the time, there were only two people in the company who could handle AI infrastructure: Yamaguchi, our CTO, and myself. The support from NTTPC was extremely helpful,” Watanabe recalled.

In February 2026, Turing also entered into a partnership with GMO Internet, Inc. and began a long-term contract for “GMO GPU Cloud.”

“Even when Gaggle Cluster is running at full capacity, depending on development status, we may still lack sufficient computing resources. To avoid slowing down development, we use every available cloud service to absorb overflow workloads,” said Watanabe.

Deploying DDN EXAScaler for Absolute Throughput and Reliability

Gaggle Cluster consists of 12 nodes, each equipped with eight NVIDIA H100 GPUs. The nodes are connected via an InfiniBand interconnect with 400 Gbps per GPU and 3.2 Tbps per node. CPU nodes, GPU nodes, management nodes, and other components are connected via Ethernet.

The storage is also connected via Ethernet using RoCEv2, or RDMA over Converged Ethernet, achieving throughput of more than 8 GB/s per node and over 100 GB/s across the entire storage system.

For this storage system, Turing adopted DDN EXAScaler ES400NVX2.

“In the development of camera-based autonomous driving models, the training data consists of images and videos, which becomes extremely large. For this reason, we invest a significant portion of our resources in storage,” said Watanabe.

The first requirement for storage was throughput.

“We were aiming for throughput of more than 1 GB/s per GPU. In training autonomous driving models from video, the data is divided into sequences of still images ranging from several hundred KB to several MB, which are continuously read from storage. Therefore, high I/O throughput is critical.” Watanabe emphasized.

Throughput performance is also important for shortening the time required to write checkpoints. In AI training, distributed parallel training across multiple nodes runs for long periods of time. If even one node fails during the process, the entire computation fails. Therefore, the intermediate computation state is regularly saved as a checkpoint. Since computation pauses while this save operation is performed, shortening the write time helps reduce the overall computation time.

Another storage requirement Turing prioritized was reliability. AI training depends entirely on data, and if storage stops, all computation stops.

“DDN EXAScaler has a long track record of operation in AI development. No matter how impressive a new product’s specifications may be, I believe that some bugs and operational issues are inevitable in its early stages. We chose the reliability of DDN EXAScaler based on its proven track record,” Watanabe stressed.

Regarding this proven track record, Watanabe also noted:

“In AI training, many technologies originally developed for supercomputers are used. Therefore, we determined that the best choice was EXAScaler, which is based on the Lustre file system adopted by many supercomputers ranked in the TOP500.”

DDN Storage Configuration for Gaggle Cluster
DDN Storage Configuration for Gaggle Cluster

DDN Storage Configuration for Gaggle Cluster

Accelerating Development Toward Fully Autonomous Driving by 2030

Regarding the DDN EXAScaler system that was deployed, Watanabe expressed satisfaction:

“DDN EXAScaler is delivering the exact results we anticipated. Because we have sufficient storage performance headroom, we can keep the GPUs running at full capacity and still have room to optimize the software.”

Watanabe also said DDN’s support was a major factor in the deployment.

“They provided support on configuration, parameters, and other aspects. I think one of the advantages of choosing DDN is that engineers who have deployed world-class supercomputers provide support and deployment consulting,” Watanabe said, expressing his appreciation.

Turing has set a milestone of realizing fully autonomous driving by 2030. Toward this goal, the company plans to strengthen AI model development and is now planning the next-generation computing infrastructure that will succeed Gaggle Cluster.

“We plan to start operating a system with roughly a tenfold increase in performance compared to the current system by around 2027,” said Watanabe.

For the storage in this next-generation infrastructure, Turing is considering separating different types of storage, such as cloud integration and object storage, for parts of the workflow outside of training, including data processing before and after training.

“Even so, model training is a workload that will absolutely continue to exist, and throughput is more important than anything else,” Watanabe said, expressing his expectations for further improvements in storage performance and for DDN.

Concept for Turing’s Planned Next-Generation Computing Infrastructure
Concept for Turing’s Planned Next-Generation Computing Infrastructure

Turing Inc.

Turing Inc. is a startup working on the development of fully autonomous driving. The company is simultaneously developing end-to-end autonomous driving AI, which performs environmental recognition, route planning, and driving control through a single AI, as well as large-scale foundation models that acquire an understanding of common sense, background knowledge, and context in human society.

By integrating these technologies, Turing aims to realize “fully autonomous driving,” in which vehicles perform driving operations on behalf of humans under all conditions.

Kohei Watanabe, Senior Infrastructure Engineer, Turing Inc.
Kohei Watanabe
Senior Infrastructure Engineer
Turing Inc.

DDN Solution

  • Use Case
    Storage for “Gaggle Cluster,” an AI computing infrastructure for fully autonomous driving system development.
  • Deployed System

 


*This case study was prepared based on an interview conducted at Turing Inc. on March 26, 2026.

公開日時