Stylized line drawing of mark playing the flute

Uniqueness of US City Names

I've traveled around the USA a lot in my life and I've always noticed how many city names are re-used from state-to-state (not to mention US cities that use names from other countries)!

On a trip to Bowling Green, KY a couple of years ago, we accidentally booked a hotel for Bowling Green, OH and I started to wonder: How many unique city names are there?!?

So, today, I decided to try and roughly quantify it. I found this great free database of US cities from SimpleMaps which has 30,844 US cities listed in a convenient CSV format.

I threw together a simple PHP script to process the data and here's what I found.

Out of 30,844 US cities, there were 16,782 cities with unique names (about 54%).

And the most common city names (which I defined as cities that were duplicated 15 times or more) are:

  • Franklin (29)
  • Fairview (24)
  • Marion (23)
  • Clinton (23)
  • Greenville (22)
  • Springfield (21)
  • Madison (21)
  • Georgetown (20)
  • Centerville (20)
  • Kingston (20)
  • Arlington (19)
  • Salem (19)
  • Clayton (19)
  • Washington (18)
  • Dayton (18)
  • Oakland (18)
  • Lexington (18)
  • Princeton (18)
  • Waverly (18)
  • Cleveland (17)
  • Jackson (17)
  • Manchester (17)
  • Fairfield (17)
  • Burlington (17)
  • Hillsboro (17)
  • Auburn (17)
  • Ashland (17)
  • Riverside (16)
  • Buffalo (16)
  • Trenton (16)
  • Mount Vernon (16)
  • Farmington (16)
  • Milton (16)
  • Chester (16)
  • Jamestown (16)
  • Jefferson (16)
  • Fulton (16)
  • Columbus (15)
  • Bridgeport (15)
  • Columbia (15)
  • Albany (15)
  • Aurora (15)
  • Henderson (15)
  • Monroe (15)
  • Utica (15)
  • Middletown (15)
  • Florence (15)
  • Troy (15)
  • Lebanon (15)
  • Weston (15)
  • Hamilton (15)
  • Danville (15)
  • Liberty (15)
  • Oxford (15)
  • Newport (15)
  • Hudson (15)
  • Midway (15)
  • Oak Grove (15)
  • Union (15)

Here's a Github project which has a basic composer file and docker setup for running locally.

And here's the code for parsing the data:

<?php

require_once __DIR__ . '/vendor/autoload.php';

use League\Csv\Reader;

$csv = Reader::createFromPath(__DIR__ . '/uscities.csv', 'r');
$csv->setHeaderOffset(0);
$header_offset = $csv->getHeaderOffset();
$header = $csv->getHeader(); 

$results = [];

$records = $csv->getRecords();
$cities = [];
foreach ($records as $offset => $record) {
	$cities[] = $record['city_ascii'];
	$clean_name = preg_replace('/[^A-Za-z0-9]/', '', $record['city_ascii']);

	if ( ! isset($results[$clean_name])) {
		$results[$clean_name] = [
			'count' => 0,
			'name' => $record['city_ascii'],
		];
	}

	$results[$clean_name]['count']++;
}

uasort($results, function($a, $b) {
	if ($a['count'] == $b['count']) {
		return 0;
	}

	return $a['count'] > $b['count'] ? -1 : 1;
});

$common_cities = array_filter($results, function($city) {
	return $city['count'] >= 15;
});

$unique_cities = array_filter($results, function($city) {
	return $city['count'] === 1;
});

$unique_city_count = count($unique_cities);
$total_city_count = count($cities);

echo "Total cities: $total_city_count<br />";
echo "Unique cities: $unique_city_count<br />";
echo "Percent unique: " . round($unique_city_count / $total_city_count * 100, 2) . "%<br />";

echo "<br />Common cities:<br />";
foreach ($common_cities as $city) {
	echo $city['name'] . ' (' . $city['count'] . ')<br />';
}