AI & Machine Learning

AI Training Dataset

Name: Stockfilm Vintage Archival Footage Dataset
Creator: Stockfilm

License a large-scale dataset of vintage archival footage for computer vision, generative AI, and temporal analysis research. Over 217,560 digitized clips with structured metadata spanning 1800–2099.

217,560 clips~800,000 still frames396+ hours126 countries

Key Stats

217,560+Video Clips

~800KStill Images

396+Hours of Footage

126Countries

920+Cities

1800–2099Temporal Range

Dataset Contents

Each clip record includes the following structured metadata fields:

Title — descriptive clip title
Description — detailed narrative description (100% coverage)
Keywords — searchable keyword tags (100% coverage)
Shot Year — estimated year the footage was captured
Shot Decade — decade classification (e.g. 1950s, 1960s)
Duration — clip length in milliseconds
Format Tags — film format (8mm, 16mm, Super 8, Home Movie)
Location — city, region, and country
Geo Tokens — tokenized geographic identifiers
Copyright — rights and ownership information

Format Breakdown

Clips are tagged by original film format:

Format	Clips
HomeMovie	194,249
8mm	159,495
16mm	155,609
Super8	1,146
35mm	157
VHS	58
DV	10

Decade Coverage

Distribution of clips by the decade in which they were shot:

Decade	Clips
1800s	10
1810s	11
1820s	10
1830s	10
1840s	10
1850s	10
1860s	11
1870s	11
1880s	12
1890s	8
1900s	10
1910s	11
1920s	10
1930s	7,867
1940s	10,573
1950s	34,294
1960s	68,156
1970s	32,001
1980s	7,554
1990s	13
2000s	12
2010s	23,566
2020s	31,665
2030s	10
2040s	10
2050s	10
2060s	4
2070s	4
2080s	10
2090s	10

Geographic Coverage

Footage spans 126 countries and 920+ cities. Top 15 countries by clip count:

Country	Clips
United States	37,737
Mexico	1,327
Canada	1,042
France	838
Italy	488
India	428
Russia	409
Japan	392
England	367
Denmark	366
Kenya	339
Spain	276
Greece	262
Germany	258
Cuba	226

Still Images

In addition to video clips, the dataset includes approximately 800,000 still frame extractions. These high-resolution images are derived from key frames across the archive and can be used independently for image classification, object detection, and visual similarity research.

Use Cases

Computer Vision — train object detection, scene classification, and activity recognition models on authentic mid-century imagery
Generative AI — fine-tune video and image generation models to produce realistic vintage aesthetics including film grain, color shifts, and period artifacts
Temporal Analysis — study visual changes over decades across consistent geographic locations, fashion, architecture, and urban landscapes
Multimodal Research — pair rich textual metadata (titles, descriptions, keywords) with visual content for vision-language model training
Cultural Preservation — develop AI tools for automated restoration, colorization, and cataloging of historical film archives

Case Study: Paris, France 1947

AI-generated image trained on 1947 Paris, France archival footage

To test the dataset’s potential for generative AI, we trained a model on the likeness of a specific genre — targeting Paris, France circa 1947 using approximately 2,500 still images extracted from the collection. The results were remarkable: the generated images were nearly indistinguishable from authentic archival footage, capturing the film grain, color palette, and period atmosphere with striking accuracy.

Browse the source collection at 1947 Paris France to see the original training material. The trained model is open-sourced and available for download — try it yourself and see the results firsthand at CivitAI.

License This Dataset

Interested in licensing the Stockfilm dataset for AI training, academic research, or commercial applications? Contact us to discuss pricing, delivery formats, and custom subsets.