AI & Machine Learning
AI Training Dataset
License a large-scale dataset of vintage archival footage for computer vision, generative AI, and temporal analysis research. Over 217,560 digitized clips with structured metadata spanning 1800–2099.
Key Stats
Dataset Contents
Each clip record includes the following structured metadata fields:
- Title — descriptive clip title
- Description — detailed narrative description (100% coverage)
- Keywords — searchable keyword tags (100% coverage)
- Shot Year — estimated year the footage was captured
- Shot Decade — decade classification (e.g. 1950s, 1960s)
- Duration — clip length in milliseconds
- Format Tags — film format (8mm, 16mm, Super 8, Home Movie)
- Location — city, region, and country
- Geo Tokens — tokenized geographic identifiers
- Copyright — rights and ownership information
Format Breakdown
Clips are tagged by original film format:
| Format | Clips |
|---|---|
| HomeMovie | 194,249 |
| 8mm | 159,495 |
| 16mm | 155,609 |
| Super8 | 1,146 |
| 35mm | 157 |
| VHS | 58 |
| DV | 10 |
Decade Coverage
Distribution of clips by the decade in which they were shot:
| Decade | Clips |
|---|---|
| 1800s | 10 |
| 1810s | 11 |
| 1820s | 10 |
| 1830s | 10 |
| 1840s | 10 |
| 1850s | 10 |
| 1860s | 11 |
| 1870s | 11 |
| 1880s | 12 |
| 1890s | 8 |
| 1900s | 10 |
| 1910s | 11 |
| 1920s | 10 |
| 1930s | 7,867 |
| 1940s | 10,573 |
| 1950s | 34,294 |
| 1960s | 68,156 |
| 1970s | 32,001 |
| 1980s | 7,554 |
| 1990s | 13 |
| 2000s | 12 |
| 2010s | 23,566 |
| 2020s | 31,665 |
| 2030s | 10 |
| 2040s | 10 |
| 2050s | 10 |
| 2060s | 4 |
| 2070s | 4 |
| 2080s | 10 |
| 2090s | 10 |
Geographic Coverage
Footage spans 126 countries and 920+ cities. Top 15 countries by clip count:
| Country | Clips |
|---|---|
| United States | 37,737 |
| Mexico | 1,327 |
| Canada | 1,042 |
| France | 838 |
| Italy | 488 |
| India | 428 |
| Russia | 409 |
| Japan | 392 |
| England | 367 |
| Denmark | 366 |
| Kenya | 339 |
| Spain | 276 |
| Greece | 262 |
| Germany | 258 |
| Cuba | 226 |
Still Images
In addition to video clips, the dataset includes approximately 800,000 still frame extractions. These high-resolution images are derived from key frames across the archive and can be used independently for image classification, object detection, and visual similarity research.
Use Cases
- Computer Vision — train object detection, scene classification, and activity recognition models on authentic mid-century imagery
- Generative AI — fine-tune video and image generation models to produce realistic vintage aesthetics including film grain, color shifts, and period artifacts
- Temporal Analysis — study visual changes over decades across consistent geographic locations, fashion, architecture, and urban landscapes
- Multimodal Research — pair rich textual metadata (titles, descriptions, keywords) with visual content for vision-language model training
- Cultural Preservation — develop AI tools for automated restoration, colorization, and cataloging of historical film archives
Case Study: Paris, France 1947

To test the dataset’s potential for generative AI, we trained a model on the likeness of a specific genre — targeting Paris, France circa 1947 using approximately 2,500 still images extracted from the collection. The results were remarkable: the generated images were nearly indistinguishable from authentic archival footage, capturing the film grain, color palette, and period atmosphere with striking accuracy.
Browse the source collection at 1947 Paris France to see the original training material. The trained model is open-sourced and available for download — try it yourself and see the results firsthand at CivitAI.
License This Dataset
Interested in licensing the Stockfilm dataset for AI training, academic research, or commercial applications? Contact us to discuss pricing, delivery formats, and custom subsets.
Contact Us