Case Study / 16 min read

Building an AI Metadata Assistant for a Searchable Photo Archive

How local or cloud vision AI, structured metadata, confidence checks, and human review can turn years of loosely organized images into useful creative inventory.

Download the working script

AI Photo Metadata Enricher

Public version 1.0.0. Dry run is the default, so image files remain unchanged until you add --write.

Download Script

Install packages

python -m pip install pillow requests

Run a five-image dry test

python ai-photo-metadata-enricher.py "D:\Photos" --model "your-vision-model-id" --limit 5

Write metadata with backups

python ai-photo-metadata-enricher.py "D:\Photos" --model "your-vision-model-id" --limit 5 --write --backup

Official LM Studio documentation

Start the local server OpenAI-compatible API Local-network security

The operational problem: valuable images were becoming invisible

Photographers create images faster than they can organize them. A single shoot may produce hundreds of files, while years of work accumulate behind camera-generated names such as DSC_4839.jpg. The photographs still exist, but weak descriptions, inconsistent keywords, and fragmented folders make them increasingly difficult to find.

That is more than a filing problem. A photo archive is reusable business inventory for portfolios, marketing, client galleries, social media, stock submissions, reference boards, and future products. When the archive cannot be searched by subject, setting, wardrobe, lighting, composition, or category, much of that value is effectively locked away.

The goal of this project was to create an AI assistant that could understand each image, generate useful metadata, and prepare the archive for better search. The same workflow can use a local model for stronger data control or a cloud vision model when convenience, speed, and lower setup requirements matter more.

The system: a local AI cataloging pipeline

The resulting workflow scans a selected folder and its subfolders, identifies supported image files, reads the metadata already present, and measures how complete that metadata is. It then creates a smaller preview in memory, sends the preview and relevant context to a vision-capable AI model running on the local network, and parses the response into a predictable metadata record.

For each image, the model proposes a title, detailed description, concise alt text, keywords, a broad category, confidence score, warnings, and content flags where appropriate. The script can print those suggestions for review or write approved fields back into the image through ExifTool.

The workflow follows a reusable pattern: inspect, reduce, interpret, structure, review, write, and log. The AI handles visual interpretation, while deterministic code controls file discovery, metadata fields, confidence thresholds, writing behavior, and audit records.

Choosing between local AI and a cloud model

Photography archives can include private client work, unreleased projects, personal images, boudoir sessions, or other sensitive material. Sending every preview to a public cloud service may be unacceptable even when the technical capability is strong.

This system calls an OpenAI-compatible endpoint hosted by a local AI server. Image previews remain on the photographer's own machine or network, which creates more control over privacy, model choice, prompt design, and operating cost.

A cloud version can send the same preview, prompt, and metadata structure to the OpenAI API using a vision-capable model. This removes the need to maintain local model hardware and can make deployment easier, but the business must approve sending image previews to an outside provider and review the provider's current data-handling terms.

Local-first does not remove every security responsibility, and cloud use is not automatically the wrong choice. The practical decision depends on archive sensitivity, available hardware, desired processing speed, setup effort, and whether a small per-image API charge is preferable to operating a local model.

What cloud AI would cost per image

The following estimate uses 1,250 input tokens per image and 250 to 500 output tokens. At the supplied rates of $0.75 per 1 million input tokens and $4.50 per 1 million output tokens, the estimated API cost is about $0.0021 to $0.0032 per image.

This is usage-based OpenAI API pricing, which is billed separately from a ChatGPT subscription. Actual cost can change with the selected model, image detail and size, prompt length, response length, retries, and current provider pricing.

For simple planning, budget approximately $0.003 per image: about $0.30 for 100 images, $3.00 for 1,000 images, or $30.00 for 10,000 images. These estimates cover model tokens only and exclude application hosting, storage, engineering, review time, and other operating costs.

Standard input pricing

Images	250 output tokens	500 output tokens
1 image	$0.0021	$0.0032
100 images	$0.21	$0.32
1,000 images	$2.06	$3.19

With cached input pricing

Cached pricing may apply when eligible input is reused. Treat it as a possible optimization, not a guaranteed rate for every request.

Images	250 output tokens	500 output tokens
1 image	$0.00122	$0.00234
100 images	$0.12	$0.23
1,000 images	$1.22	$2.34

Per-image calculation: input costs approximately $0.00094; output costs approximately $0.00113 at 250 tokens or $0.00225 at 500 tokens.

What you need before running the script

The downloadable script needs Python 3, the Pillow and Requests packages, ExifTool, LM Studio, and a vision-capable model that can accept image input. HEIC and HEIF files also need the optional pillow-heif package because standard Pillow installations may not open those formats.

You do not need to edit a server address or model name inside the Python file. The cleaned version accepts machine-specific settings as command-line options. This makes the same download usable on another computer and prevents a private network address from being baked into the public script.

Start with a small folder containing copies of a few representative images. Include the formats and subject matter you actually use. That gives you a safe test set for checking model quality, metadata compatibility, and processing speed before the script touches a working archive.

Set up a local vision model in LM Studio

Install and open LM Studio, then download a model that explicitly supports vision or image input. A text-only model cannot inspect the preview and will either fail or produce unreliable output. The model size and quantization need to fit the available RAM or GPU memory on the machine running LM Studio.

Open LM Studio's Developer page and start the local API server. The current default server address is http://localhost:1234. The script uses LM Studio's OpenAI-compatible chat completions endpoint at http://localhost:1234/v1/chat/completions.

Copy the exact model identifier shown by LM Studio and pass it with --model. You can also inspect the identifiers visible to the server by opening http://localhost:1234/v1/models. If Just-In-Time model loading is disabled, load the model into memory before running the script.

Keep the server bound to localhost when LM Studio and the script run on the same computer. If the AI server is on another machine, enable LM Studio's local-network option, use that machine's LAN address in --endpoint, and enable API authentication. Binding the server beyond localhost exposes it to other devices on the network.

Install Python packages and ExifTool

Confirm Python is available by running python --version in PowerShell or a terminal. Install the required libraries with python -m pip install pillow requests. Add pillow-heif when the archive contains HEIC or HEIF images.

ExifTool is a separate application, not a Python package. Install it and make sure exiftool -ver returns a version number. If ExifTool is not on the system PATH, pass its full executable path with --exiftool.

The script uses ExifTool because image metadata is spread across EXIF, IPTC, and XMP standards. Writing the title, description, caption, category, keywords, accessibility text, and modification dates through ExifTool gives downstream photography applications a better chance of recognizing the result.

The settings each user must supply

The folder argument is the root of the photo collection to scan. Put the path in quotes when it contains spaces. The --model value must match the vision model identifier exposed by LM Studio. Those are the two values every user must provide.

The --endpoint value can remain at its default when the script and LM Studio run on the same computer. Change it only when LM Studio uses a different port or runs elsewhere on the local network. If LM Studio authentication is enabled, set the LM_STUDIO_API_KEY environment variable or pass the token with --api-key. The environment variable avoids placing the token directly in a saved command. Use --exiftool only when the executable cannot be found automatically.

Operational controls include --limit for a small batch, --min-confidence for the write threshold, --skip-existing-description to protect already-captioned files, --long-side and --jpeg-quality for preview generation, --timeout for slower models, and --log for the JSON Lines audit file location.

The one part a photographer may intentionally customize inside the script is the system prompt. That is where studio-specific vocabulary, approved categories, keyword conventions, caption tone, and sensitive-content rules belong. Test prompt changes in dry-run mode before enabling writes.

Run a five-image dry test first

A first run should include --limit 5 and should not include --write. The script will read metadata, create previews, call the local model, print a compact result, and append the full record to ai_photo_metadata_log.jsonl without changing an image.

Review the proposed title, description, category, keywords, content flags, confidence, and warnings. Check whether the model invents locations, identities, relationships, or other unsupported facts. Also confirm that it uses the vocabulary needed for the archive instead of collapsing important distinctions into generic labels.

If every file fails, verify that LM Studio's server is running, the endpoint is correct, and the model identifier matches. If only HEIC files fail, install pillow-heif. If metadata reading fails, confirm the ExifTool path. If responses are slow, raise --timeout or try a smaller vision model.

Write metadata only after reviewing the dry run

Add --write only after the model output looks consistently useful. Add --backup during early write tests so ExifTool keeps an _original copy beside each modified file. Test on copied images and inspect the result in the photography application that will consume the metadata.

The script clears and replaces the XMP and IPTC keyword lists when it writes. That behavior keeps AI keywords clean and deduplicated, but it may not fit an archive with carefully curated existing keywords. In that case, adjust the write function to merge approved terms or use --skip-existing-description to narrow the batch.

The confidence score comes from the model and should be treated as a routing signal, not a guarantee of correctness. Increase --min-confidence when the model is inconsistent, and keep human review around sensitive archives or any workflow where incorrect metadata has business consequences.

How each image moves through the workflow

First, ExifTool reads the existing EXIF, IPTC, XMP, file, and composite metadata. The script keeps useful camera, date, title, caption, and keyword context while excluding noisy binary previews and other fields the model does not need.

Next, the image is opened, correctly oriented, converted when necessary, and resized so its long side is 1024 pixels. That preview is held in memory rather than saved as another file. The smaller image is usually sufficient for visual understanding while reducing processing time and request size.

The selected local or cloud model receives the preview, the cleaned metadata, and strict instructions to return JSON. The response is parsed into named fields, keywords are cleaned and deduplicated, confidence is checked, and the result is written to a JSON Lines log. Only then can the optional write stage update common title, description, caption, and keyword fields that other photography tools are likely to recognize.

Safety was designed into the operating modes

Dry run is the default. In this mode, the assistant analyzes images and prints its proposed metadata without changing the originals. The photographer can test a small batch, inspect the language and classifications, and improve the prompt before permitting any writes.

Write mode requires an explicit flag. A separate backup option preserves original copies during testing, while a minimum-confidence threshold prevents weak suggestions from being written automatically. Images that already contain descriptions can also be skipped.

These controls matter because good AI automation is not blind automation. The model is allowed to suggest and structure information, but the surrounding system decides when the suggestion is trustworthy enough to proceed. Logs and warnings make the process inspectable when a result needs correction.

Handling sensitive and specialized archives accurately

Generic image models often become vague around boudoir, nude, fetish, or explicit material. Vague language may sound cautious, but it produces poor private-archive metadata because it removes the distinctions a photographer actually needs for search and organization.

The prompt therefore asks for direct, factual cataloging language when adult content is visible while prohibiting guesses about age, identity, location, or other sensitive facts. Separate adult and explicit content flags support later routing without forcing those judgments into a public-facing caption.

This illustrates a broader implementation lesson: useful AI systems need domain rules. A photography workflow should reflect the archive's real categories, vocabulary, privacy requirements, and review standards rather than relying on a generic description prompt.

The next layer: naming, routing, and review

Once the system can describe an image reliably, the same structured output can support file naming and folder organization. A future version can propose safe filename components such as date, category, short subject description, and sequence number, then let code sanitize and assemble the final name.

The model can also recommend a destination from a controlled list of categories rather than inventing unlimited folder names. High-confidence images could be copied into an organized output library, medium-confidence images could enter a review queue, and low-confidence images could remain untouched.

Copy mode should come before move mode so the original archive remains intact during validation. Duplicate protection, sidecar review records, allowed category lists, and an approval dashboard would turn the metadata script into a controlled photo library organizer rather than an opaque batch process.

The business value: making creative inventory usable again

The immediate output is better metadata, but the larger result is operational visibility. Searchable titles, descriptions, categories, and keywords make it easier to rediscover portfolio candidates, prepare marketing collections, locate client examples, assemble stock submissions, and understand what the archive contains.

The processing log can also become a lightweight operations dataset. It can show how many files were processed, skipped, flagged for review, or rejected for low confidence; which folders remain incomplete; and where recurring errors appear.

This case study demonstrates a practical role for both local and cloud AI in creative businesses. The system does not replace the photographer's eye or make creative decisions. It reduces the repetitive work around the images so the archive becomes easier to search, govern, and reuse.

What other businesses can reuse from this pattern

The architecture extends well beyond photography. A business can take messy source material, create a smaller or cleaner representation, ask AI to interpret it, require structured output, apply deterministic validation, route uncertain results to a person, and preserve a log of every action.

That same pattern can support document intake, product tagging, invoice extraction, content libraries, compliance review, customer request triage, and internal knowledge systems. The source changes, but the operating design remains recognizable.

The durable lesson is that the model is only one component. Privacy boundaries, structured outputs, safe defaults, confidence thresholds, human approval, and audit records are what turn an AI capability into a dependable workflow.

Want this mapped against your business?

Bring the bottleneck, reporting loop, or manual workflow. Beach Breeze Studios will help identify the system layer that removes the drag.

Get a Workflow Cleanup Audit