San Francisco-based startup Poseidon has raised $15 million in seed funding to develop a decentralized infrastructure for AI training data. The round was led by Andreessen Horowitz's crypto arm, marking a significant bet on blockchain solutions for artificial intelligence development.
Poseidon's chief scientist Sandeep Chinchali identifies a critical gap in current AI ecosystems: "While computing power and large language models have advanced rapidly, the industry now faces a shortage of 【high-quality, legally-cleared】 training data." The platform aims to solve this by creating an IP-verified data pipeline using decentralized technology.
——"We're building the equivalent of a data commons for AI," Chinchali told Cointelegraph. "This isn't just about volume—it's about creating properly licensed, structured datasets that reflect real-world complexity."——
Unlike traditional centralized data brokers, Poseidon's system combines blockchain verification with Story Protocol's licensing technology. The approach allows:
• Data contributors to receive compensation through transparent smart contracts
• AI developers to access niche datasets without copyright risks
• Continuous updates through distributed collection networks
Chris Dixon of a16z Crypto describes the project as "fundamentally rethinking how value flows in data ecosystems." The funding will accelerate development of contributor tools and licensing frameworks, with early access planned for summer 2025.
Analysts note that easily accessible web data—the fuel for early AI models—has largely been exhausted. Poseidon targets emerging needs in specialized domains like:
• Robotics and spatial computing
• Industry-specific language models
• Real-time sensor data applications
The platform's whitepaper reveals an ambitious roadmap to create 【data DAOs】—decentralized organizations where contributors collectively govern dataset development and monetization.
Traditional data collection struggles with three key limitations that Poseidon's model addresses:
1. Legal uncertainties around copyrighted material
2. Inability to compensate niche data providers
3. Centralized bottlenecks in dataset curation
——"No single company can source the diversity of data needed for next-gen AI,"—— notes Carra Wu, a16z Crypto research partner. The decentralized approach could unlock new categories of training data while reducing legal risks that have led to high-profile lawsuits against AI companies.
As AI development enters its next phase, solutions like Poseidon's may determine whether the industry can move beyond scraping publicly available data toward sustainable, ethical training data ecosystems.