Vietnam Combat Map in 70 lines of code

Mark Di Marco

This post will cover creating an interactive, 3D geographical data visualization, with some raw data, 30 lines of basic Python, 35 lines of Javascript, and a few helpful bash commands. Source code on Github can be found here, and a live demo can be viewed here.

Map we'll be building using deck.gl and H3 for hexagonal clustering Screenshot of hexgrid map showing US military force bombings in Vietnam

Understanding The Data

The data we'll use comes from the THOR dataset published by data.mil, the U.S. Defense Digital Service, available for download on data.world.

The data is in CSV format, containing 4.6 million rows and 47 columns. It covers air missions by coalition forces from the years 1965 to 1975. U.S. military operations in Vietnam largely ended in 1973.

The 4.6 million rows of data available for Vietnam contains all types of air missions - including bombings, recon, psy-ops, troop deployments, and supply drops. Our visualization is meant to cover bombings in particular, so we'll have to remove some data to isolate these mission types. The following is a sampling of the THOR data:

thor_data_viet_id countryflyingmission milservice msndate sourceid sourcerecord valid_aircraft_root takeofflocation tgtlatdd_ddd_wgs84 tgtlonddd_ddd_wgs84 tgttype numweaponsdelivered timeontarget weapontype weapontypeclass weapontypeweight aircraft_original aircraft_root airforcegroup airforcesqdn callsign flthours mfunc mfunc_desc missionid numofacft operationsupported periodofday unit tgtcloudcover tgtcontrol tgtcountry tgtid tgtorigcoords tgtorigcoordsformat tgtweather additionalinfo geozone id mfunc_desc_class numweaponsjettisoned numweaponsreturned releasealtitude releasefltspeed resultsbda timeofftarget weaponsloadedweight year
404557 UNITED STATES OF AMERICA USN 1968-06-21 4757386 CACTA6669 A-7 WESTPAC 66 18.38 105.744 TRUCK PARK\STOP 0 120.0 500LB GP MK-82 500 A7A A7 18 1.0 STRIKE A0002 1 ROLLING THUN D N09122 PREBRIEFED TGT NORTH VIETNAM 18380N105744E DD.DDDN DDD.DDDE 1309H MISSION: RT57C - CONTROL_TYPE: PREBRIEFED TGT - RTCOR: CORPS AREA II (IN-COUNTRY MISSIONS) OR RTE AREA 2 (OUT-OF-COUNTRY MISSIONS) WF 26299335 KINETIC 900 0 120.0 -1 1968
404616 UNITED STATES OF AMERICA USN 1968-10-11 4757606 CACTA6669 F-4 WESTPAC 43 18.466666 105.766666 AREA\DEPOT 0 0.0 SIDEWINDER 164 F4B F4 17 35.0 ESCORT/COVER L0035 1 D N09558 PREBRIEFED TGT NORTH VIETNAM 1828N10546E DDMMN DDDMME MISSION: UERT614B - CONTROL_TYPE: PREBRIEFED TGT - RTCOR: CORPS AREA II (IN-COUNTRY MISSIONS) OR RTE AREA 2 (OUT-OF-COUNTRY MISSIONS) WF 26299395 NONKINETIC 0 4 0.0 -1 1968
404560 UNITED STATES OF AMERICA USN 1968-04-09 4757394 CACTA6669 A-4 WESTPAC 31 16.675 106.667 TROOPS 152 210.0 2.75IN HE RKT 18 A4F A4 76 6.0 DIRECT AIR SUPPORT B0008 4 IN COUNTRY D N09933 ABN FWD AIR CNTL SOUTH VIETNAM 16675N106667E DD.DDDN DDD.DDDE 9996H MISSION: VS - CONTROL_TYPE: ABN FWD AIR CNTL - RTCOR: CORPS AREA I (IN-COUNTRY MISSIONS) OR RTE AREA 1 (OUT-OF-COUNTRY MISSIONS) XC 26299338 KINETIC 0 0 220.0 -1 1968
404564 UNITED STATES OF AMERICA USN 1968-09-07 4757409 CACTA6669 A-7 WESTPAC 66 18.977777 105.447222 RADAR 0 1455.0 MK-11/MK-12 57 A7A A7 0 1.0 STRIKE A0031 1 ROLLING THUN N N09943 TGT OF OPPORTUNITY NORTH VIETNAM 185840N1052650E DDMMSSN DDDMMSSE 9995H MISSION: RTIH57C - CONTROL_TYPE: TGT OF OPPORTUNITY - RTCOR: CORPS AREA III (IN-COUNTRY MISSIONS) OR RTE AREA 3 (OUT-OF-COUNTRY MISSIONS) WF 26299342 KINETIC 0 10 1456.0 -1 1968
404565 UNITED STATES OF AMERICA USAF 1968-08-10 4757412 CACTA6669 A-37 BIEN HOA 10.1113705 105.9828826 CAMP 0 215.0 7.62MM 6 A37A A37 22 3.0 AIR INTERDICTION JA009 2 IN COUNTRY - SOUTH VIETNAM D JAPNU0 ABN FWD AIR CNTL SOUTH VIETNAM 48PXS08001780 ##ABC11112222 9999N MISSION: VS08-5840 - CONTROL_TYPE: ABN FWD AIR CNTL - RTCOR: CORPS AREA IV (IN-COUNTRY MISSIONS) OR RTE AREA 4 (OUT-OF-COUNTRY MISSIONS) XS 26299343 KINETIC 0 24 229.0 -1 1968
404572 UNITED STATES OF AMERICA USAF 1968-12-14 4757454 CACTA6669 F-4 UBON AB 16.771 106.117 SEGMENT\TRANS ROUTE 12 845.0 0 F4D F4 22 1.0 STRIKE JV006 2 STEEL TIGER - BALDIE D JALJI0 ABN FWD AIR CNTL LAOS 16771N106117E DD.DDDN DDD.DDDE MISSION: SL349 - CONTROL_TYPE: ABN FWD AIR CNTL - RTCOR: RTE AREA F (OUT-OF-COUNTRY MISSIONS) XC 26299350 KINETIC 0 0 846.0 -1 1968

There are a handful of columns that we can use to infer which rows to include and which rows to filter. numweaponsdelivered looked interesting up front, but it has non-zero values for all different types of missions, including recon and troop deployments.

mfunc_desc has definitive values, like STRIKE and AIR INTERDICTION, but while digging through these values, there seemed to be a high cardinality in the number of distinct values, and our filtering method would have to maintain that list.

I ended up using a combination of mfunc_desc_class (distinct values are only KINETIC and NONKINETIC), numweaponsdelivered when the value was greater than 0, and a check to exclude vlaues without target latitude and longitude in columns tgtlatdd_ddd_wgs84 and tgtlondd_ddd_wgs84.

Preparing The Data

1AWK is a text processing command line program that's been around since 1977, more information is available at AWK Wiki A fresh install of MacOS or Linux come well equiped with the commands we'll need to filter our data, namely using the awk1 program

With the data downloaded on my desktop, we'll filter down the 1.6 gigabyte CSV file with a single command.

awk -F, '$9 != "" && $40 != "NONKINETIC" && $12 != 0 {print}' thor_data_vietnam.csv > thor_filtered.csv
                    

The above command says: run awk, set the column separator to a comma, if column 9 (numweaponsdelivered) is not zero, and column 40 (mfunc_desc_class) is not NONKINETIC, and column 12 (tgtlatdd_ddd_wgs84) is not empty, print the line to a file named thor_filtered.csv.

2wc -l is the standard word count program UNIX-based systems ship with. The -l option asks for a count of lines. Running wc -l thor_filtered.csv2 shows we've filtered the 4.6 million rows to 1.95 million rows, and the 1.6 gigabytes file is now 735 megabytes.

Transforming The Data

3For posterity, gzip -c thor_filtered.csv | wc -c shows 133 megabytes after gzipping. The goal for this project is to be an interactive browser-based visualization, so shipping 735 megabytes of data is out of the question, even if gzipping the file pares down the size3.

We need a way to aggregate this data before sending it to the browser. We know these 1.95 million rows all happen in a relatively small geographical area, so ideally we could find a way to cluster similar entries based on GPS coordinates. Since we'll be showing a hexgrid visualization, each hex in the visualization might encompass many square kilometers.

There exists a library called H3 that is perfect for our use case. H3 is a "Hexagonal Hierarchical Geospatial Indexing System", which provides an API that takes a latitude, longitude, and a resolution, and returns back a string that represents the bucket for that entry. Increasing the resolution parameter increases the number of buckets.

4 H3's bucketing algorithm is undestandbly lossy. Running the reverse command, h3.h3_to_geo won't give you the same coordinates you put in, but as you increase the resolution, the results will get closer and closer. Using H3's Python binding (pip install h3), we can test it with h3.geo_to_h3(0, 0, 5), outputs '85754e67fffffff' and h3.geo_to_h3(14, -100, 9) which outputs '896da150217fffff'4

Now we can write a simple Python program that reads the 1.95 million row CSV file one row at a time, computes the H3 coordinate, and aggregates the values in a simple Python dictionary object, and then serializes that object out to JSON for easy browser consumption (browsers require third-party libraries to parse CSV).

To create a JSON file based on this program, all you have to do is run python make_h3.py > hexed.json. The resulting file is 2.8 megabytes, with 53,330 distinct buckets after having filtered out all of the values with only 1 entry to reduce some of the noise.

Each entry in the JSON array has a hex key, which encodes the coordinates and bucket, and a count key, which is an aggregate of the number of rows that our Python script placed into this bucket.

Visualizing The Data

Now that we have the 1.95 gigabytes of data, filtered and aggregated down to 2.8 megabytes, we're ready to create a visualization.

The Javascript library deck.gl is going to handle the heavy lifting here. It wraps Mapbox and allows you to define WebGL-friendly visualization layers on top of the Mapbox tiles.

5This is no coincidence, Uber's open-source team maintains both deck.gl and h3. By default, deck.gl ships with support for a large number of visualizations, including Hexgrids. Even more convienent, deck.gl comes with a H3 specific visualization layer, called deck.H3HexagonLayer. We'll use this layer to draw our visuals5

Conclusion & Acknowledgements

6If not using Python 3, python -m SimpleHTTPServer 8080 Assuming that hexed.json, map.html and index.js are all in the same directory, you can run python -m http.server 80806 and navigate to localhost:8080/map.html to see the map, or checkout the source code on Github.

You can see this visualization running live at here. You can pan around, zoom in and out, and rotate the camera (ctrl-click-and-drag).

The THOR dataset also covers WW1, WW2, and the Korean War. The Vietnam War data is the most exhaustive by far. The datasets have pretty limitless potential, a lot of incredibly interesting data is in there. This visualization is just scratching the surface. I hope you find this interesting enough to dig into the data yourself.

Thanks to Uber's team for deck.gl and h3, and Dave Liepmann for tufte.css, and my employer data.world for introducing me to this data and making it free to download and use.