— Article

Drone Video Geolocation: Shadow Math and AI Targeting

TacLink C2 Team 13 min read
Drone Video Geolocation: Shadow Math and AI Targeting

A clip lands on your feed. Some grainy aerial footage, a few seconds long, no caption worth trusting. Somewhere in the world, a drone filmed this. The question that launched an entire discipline is deceptively simple: where, exactly?

Answering it has become one of the more fascinating corners of modern technology, sitting at the crossroads of open-source investigation, machine learning, optical physics, and old-fashioned trigonometry. Newsrooms use it to verify war crimes. Militaries use it to put coordinates on a target. Surveyors use it to measure a roof to the centimeter. The methods overlap more than you’d expect, and once you understand them, you start seeing the world’s geometry a little differently.

Here’s how drone video geolocation actually works, from the easy wins hiding in a video’s metadata to the AI systems learning to navigate without GPS at all.

The shortcut almost nobody talks about: the data inside the file

Before anyone reaches for clever math, the first move is the boring one. Check whether the drone already wrote down where it was.

Most consumer and enterprise drones do. DJI aircraft, from the pocket-sized Mini up through the enterprise Matrice line, can record a continuous flight log alongside the video, stored in a humble SubRip Subtitle (SRT) file. It’s the same format used for movie captions, repurposed to log telemetry. Turn on “video captions” in the flight app and the drone quietly records latitude and longitude, altitude, focal length, ISO, and velocity, timestamped frame by frame.

For an analyst, that’s gold. Tools like Telemetry Overlay or a flight-log SRT viewer can pull that data out of the file and convert it into standard geospatial formats (GPX, KML, CSV) that drop straight into Google Earth Pro, QGIS, or Esri’s ArcGIS. Suddenly you’re not guessing where the drone was; you’re watching its exact path replay on a map, down to the millisecond it captured whatever you’re trying to locate.

The catch is that this data is fragile. The moment a video gets uploaded to most social platforms, the compression strips the metadata clean. So the SRT shortcut works beautifully for footage you control, and almost never for the anonymous clip on your feed. That’s where the real detective work begins.

When the metadata is gone: reading the shadows

If you can’t ask the file where it was, you can ask the sun.

Shadow analysis, sometimes called chronolocation, is the OSINT community’s signature trick, and it rests on something that can’t be faked or stripped: the sun is in a fixed, predictable place in the sky for any given location, date, and time. A shadow is just geometry. If you can measure one accurately, you can work backward to a surprising amount of information.

The workflow looks roughly like this. An analyst finds a tall, distinct object in the footage (a telephone pole, a building corner, the edge of a wall) and measures the length of the shadow it casts. A free tool called SunCalc models the sun’s position for any spot on Earth at any moment. By matching the shadow’s length and direction to what the sun would have to be doing to produce it, you can pin down both the time of day and the direction the camera was facing.

That last part is the quiet superpower. Knowing which way the camera pointed lets you eliminate enormous swaths of the map in seconds. There’s a well-known case where investigators geolocating a video shot in Portugal worked out that the camera faced south toward the sea in the late afternoon. Because the avenue in the shot ran parallel to the coast, they could throw out everything inland and everything northeast of Lisbon, collapsing the search to a thin coastal strip almost immediately. Hours of manual hunting, replaced by one trigonometric deduction.

There’s a wrinkle worth knowing: the sun retraces its path twice a year. Its arc on August 20th mirrors its arc on April 22nd, so shadow math alone can’t always tell you the season. Analysts resolve that ambiguity by reading the environment: how full the trees are, whether construction has changed, what’s green and what’s bare.

That Portugal case is the kind of clean demonstration that shows why the technique works. The most consequential use of it, though, has been on far grimmer material, and not on drone footage at all. In 2020, during the Nagorno-Karabakh war, Bellingcat investigator Nick Waters analyzed phone footage of an execution in the town of Hadrut. Using a tree’s shadow falling across a pile of rubble visible in satellite imagery, he pinned the second video to roughly 15:11 in the afternoon. Combined with geolocation of the site, the analysis helped establish a timeline and authenticate footage that authorities had dismissed as fake, work later corroborated by the BBC and human-rights monitors. It’s the clearest proof of the principle: digital forensics built on sunlight can move faster, and sometimes hit harder, than official denials. The same shadow techniques apply directly to drone video; the sun doesn’t care what kind of camera is looking up at it.

Teaching a machine to match the view: cross-view geo-localization

Shadow analysis is brilliant, but it’s slow and human-intensive. The frontier question is whether a computer can do it automatically: take a frame from a drone and instantly find the matching spot on a global satellite map. That’s the field known as cross-view geo-localization, or CVGL, and it’s where most of the serious research energy now goes.

The core difficulty is a perspective mismatch. A drone shoots the world at an angle, low and oblique. A satellite map looks straight down, flat and orthogonal. A building that fills 80% of the drone’s frame might occupy 2% of a satellite tile. Early algorithms basically treated that distortion as noise and choked on it.

A useful way to picture the challenge: imagine trying to find yourself in a giant shopping mall using only a close-up, angled photo of one storefront, matched against a flat floor-plan of the entire mall. The perspectives don’t line up, so it feels impossible. What modern CVGL does is act like an architect who can bend that flat floor-plan into 3D space, scale it to match how close you’re standing, and slide your photo across it until the geometry locks into exactly one position.

The way researchers crack the scale problem is clever. They look for objects of known, stable size in the frame (small cars are a favorite, because vehicle dimensions barely vary worldwide) and use them as a built-in ruler to figure out the true scale of everything else. Once the algorithm knows the scale, it can crop and resize the satellite imagery to match the drone’s altitude before it even attempts a match.

A couple of other advances are quietly reshaping the field. Object detectors used to draw plain horizontal boxes around things, which works poorly for a vehicle or building seen at an angle: the box swallows a lot of irrelevant background. Newer systems use rotated bounding boxes that hug the object’s real orientation, capturing far less noise and dramatically cutting the cost of labeling training data. And the matching itself increasingly runs on Siamese or triplet neural networks, architectures designed to learn a shared “view-invariant” fingerprint for a place, so the drone view and the satellite view of the same spot produce the same signature even though they look nothing alike to the eye.

None of this would be possible without big, carefully built training sets. Benchmark datasets like University-1652 and DenseUAV gave researchers a common yardstick, and top systems now report matching a drone view to the correct satellite location around 88 to 93% of the time on those benchmarks. Impressive, though, as we’ll see, benchmark accuracy and battlefield accuracy are very different animals.

Putting a target on the actual map: elevation and ray-tracing

Recognizing where the drone is and pinpointing where a target on the ground is are two separate problems. The second one trips up naive systems in a way that’s worth understanding.

When a drone’s camera locks onto something, say a heat signature from a thermal sensor, the system projects an imaginary line, an optical ray, from the lens out toward the ground. Where that ray hits the earth is your target’s coordinate. Simple, until you ask: what does the ray hit?

Lazy algorithms assume the ground is a smooth, average-elevation surface: essentially a flat Earth. In flat country that’s tolerable. In mountains, it’s a disaster. The ray “lands” at the wrong elevation, and your coordinate can be off by a wide margin.

The fix is to give the system an honest picture of the terrain. Digital Elevation Models (DEMs), high-resolution maps of the ground’s actual height, often sourced from agencies like the USGS, let the algorithm trace the optical ray across the real, bumpy landscape cell by cell until it finds the precise point where the ray’s height matches the terrain’s height. That intersection is the target. Researchers have reported terrain-aware ray-tracing improving accuracy by more than an order of magnitude over the flat-Earth shortcut in rugged conditions.

There’s one more enemy: the drone itself never holds perfectly still. Wind, vibration, and small errors in the GPS and inertial sensors make the optical ray jitter. To smooth that out, targeting systems lean on Kalman filters: statistical engines that take many noisy measurements over time and converge on the most probable true value, steadily tightening the estimate as more data arrives.

Flying blind: navigation when GPS is jammed

All of the above quietly assumes the drone knows roughly where it is. In contested airspace, that assumption collapses. Electronic warfare can jam or spoof GPS entirely, and a drone that depends on satellites becomes a very expensive paperweight. So a whole research thread is devoted to navigating with no satellite signal at all, using nothing but what the cameras can see.

The workhorse here is SLAM, Simultaneous Localization and Mapping. The drone tracks how features in its camera feed slide across the frame from moment to moment, infers its own motion from that optical flow, and builds a running 3D map of its surroundings while locating itself inside that map. Fuse the camera data with the onboard gyroscope and accelerometer (a combination called visual-inertial odometry) and the drone can hold its position even with the satellites switched off.

The most forward-looking version of this is genuinely sci-fi. Traditional “teach and repeat” navigation requires a human to fly a route first so the drone can memorize the landmarks. The emerging approach, sometimes called virtual teach and repeat, skips the manual flight entirely. Using existing satellite imagery, the system builds a photorealistic 3D simulation of the environment with a technique called Neural Radiance Fields (NeRF), then “flies” the route inside that synthetic world to learn it. When the real drone launches, it can navigate a place it has never physically been, on its very first flight, with no GPS. Practiced entirely in a dream of the terrain, then nailed in reality.

The money, the stakes, and the arguments

It’s worth being clear-eyed about why all this effort exists, because the field is defined as much by its controversies as its capabilities.

The market is large and growing fast. Geolocation is one engine inside the broader geospatial analytics industry. Estimates vary considerably by research firm (you’ll see 2025 figures anywhere from roughly $90 billion to $180 billion depending on how the category is drawn) but one widely cited projection from Fortune Business Insights puts the market at about $102 billion in 2025, rising past $117 billion in 2026 and heading toward $300 billion-plus by the mid-2030s at a low-double-digit growth rate. North America holds the largest share, driven by defense spending and early adoption of AI mapping. Whatever the exact number, the trajectory is steep, and AI-powered spatial analytics is the thing pulling it upward.

Military accuracy is messier than the brochures suggest. Two numbers matter here, and they’re easy to blur together. Target Location Error (TLE) is how accurately a drone’s sensor turns what it sees into a grid coordinate. Circular Error Probable (CEP) traditionally describes how accurately a weapon then flies to that coordinate, and despite the name, standard CEP is a 50% figure: the radius half the strikes land inside. For a concrete sense of scale, U.S. operational testing found the RQ-21A Blackjack’s target location error was about 43.8 meters, with 90% of its coordinates falling within that radius. The testers’ own assessment was blunt: fine for a conventional open battlefield, but not accurate enough for dense urban targeting, where the gap between 10 and 40 meters is the difference between a military objective and a hospital. And that’s just the sensor’s error, a real strike adds the weapon’s own delivery error on top. The whole push toward terrain-aware ray-tracing and better algorithms is, in large part, an effort to shrink that gap, which is exactly why critics watch it so closely.

The courtroom is still arguing. Geolocated footage has reshaped how war crimes get documented, and the International Criminal Court has been moving toward accepting digital evidence for years: the landmark Al Mahdi case over the destruction of Timbuktu’s shrines, tried in 2016, leaned heavily on an interactive platform stitching together satellite imagery, photographs, and open-source video. But defense lawyers raise a fair point: in the age of deepfakes and editable metadata, how do you trust any single clip? The counterargument from investigators is that you don’t have to. While one data point can be faked, it’s effectively impossible to forge a consistent web of shadow geometry, terrain matching, satellite history, and crowd-sourced verification all at once. The aggregate is what holds up, not any one thread.

And privacy is a real fight. The same toolkit that geolocates a war crime can track a private citizen. Drone-mounted devices that intercept phone signals to triangulate a location, combined with high-resolution optical sensors, have civil-liberties advocates pushing for tighter law, while law enforcement argues that situational awareness from the air is non-negotiable for disaster response and public safety. There’s no tidy resolution here, and there probably won’t be one soon.

Where this is all heading

If the last decade was about proving these techniques work, the next few years are about making them effortless, and that shift has a clear direction.

Expect natural language to become the interface. Instead of typing coordinates, an operator will say “navigate to the red building next to the collapsed bridge,” and a system pairing a language model with a vision network will resolve that description against live satellite and video feeds. Research benchmarks built specifically for text-guided drone navigation are already pointing this way.

Expect the manual analyst workflow to get automated end to end. The vision is an “agentic” pipeline that ingests a raw social-media clip, syncs it against satellite data, runs the shadow trigonometry and the cross-view matching on its own, and outputs a verified coordinate, doing in seconds what currently takes a skilled human an afternoon.

And expect the heavy computation to move onto the drone itself. As jamming makes the radio link between drone and ground station a liability, the trend is to run SLAM, filtering, and object detection natively on the aircraft’s own hardware, the “edge.” A drone that thinks for itself can’t be cut off from its brain.

Quick answers to common questions

Can you geolocate a drone video without any metadata? Yes. When telemetry has been stripped (which is standard after social-media upload) analysts rely on visual methods: shadow analysis to establish time and camera direction, then matching landmarks, terrain, and architecture against satellite imagery to fix the exact spot.

What is the easiest way to geolocate your own drone footage? Enable the flight-log or “video captions” feature before you fly so the drone records an SRT telemetry file. After the flight, a telemetry tool can convert that log into a KML or GPX track you can view directly in Google Earth or GIS software.

How accurate is shadow-based geolocation? Accurate enough to narrow a search radius dramatically and to estimate the time of day within roughly an hour, depending on the quality of the reference objects in the footage. It’s rarely a standalone proof, but it’s a powerful filter that eliminates most of the map and corroborates other evidence.

Is cross-view geo-localization reliable yet? On curated benchmark datasets, leading AI systems match a drone view to the right satellite location around 88 to 93% of the time. Real-world performance is lower, mainly because sudden altitude changes create scale and perspective problems that controlled datasets don’t fully capture. It’s advancing quickly, but it isn’t a solved problem.


Drone video geolocation started as a slow, manual craft practiced by a handful of obsessive investigators with a free sun-position tool and a lot of patience. It’s becoming something closer to an automated sense: machines that can look at a sliver of footage and tell you where on Earth it belongs. The geometry was always there, written into shadows and terrain and the fixed arc of the sun. We’re just finally teaching software to read it.


Sources

UAS drone video OSINT geolocation AI computer vision drone forensics

Written by

TacLink C2 Team

TacLink C2 Team builds a modern desktop ground control station for independent and commercial drone pilots. Writing here covers mission planning, multi-drone operations, airspace, and the software that keeps serious UAS programs running.