Segmenting and Clustering Neighborhoods in Toronto

A peer-graded assignment on Coursera made by Anh-Thi DINH.

1. Assignment's description

In this assignment, you will be required to explore, segment, and cluster the neighborhoods in the city of Toronto.

For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. You will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset.

Once the data is in a structured format, you can replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.

2. Scrap content from wiki page

Import necessary packages.

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import numpy as np
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize  # tranform JSON file into a pandas dataframe

import folium # map rendering library

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

Scrape the "raw" table.

In [2]:
source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(source, 'lxml')

table = soup.find("table")
table_rows = table.tbody.find_all("tr")

res = []
for tr in table_rows:
    td = tr.find_all("td")
    row = [tr.text for tr in td]
    
    # Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
    if row != [] and row[1] != "Not assigned":
        # If a cell has a borough but a "Not assigned" neighborhood, then the neighborhood will be the same as the borough.
        if "Not assigned" in row[2]: 
            row[2] = row[1]
        res.append(row)

# Dataframe with 3 columns
df = pd.DataFrame(res, columns = ["PostalCode", "Borough", "Neighborhood"])
df.head()
Out[2]:
PostalCode Borough Neighborhood
0 M3A North York Parkwoods\n
1 M4A North York Victoria Village\n
2 M5A Downtown Toronto Harbourfront\n
3 M5A Downtown Toronto Regent Park\n
4 M6A North York Lawrence Heights\n

Remove "\n" at the end of each string in the Neighborhood column

In [3]:
df["Neighborhood"] = df["Neighborhood"].str.replace("\n","")
df.head()
Out[3]:
PostalCode Borough Neighborhood
0 M3A North York Parkwoods
1 M4A North York Victoria Village
2 M5A Downtown Toronto Harbourfront
3 M5A Downtown Toronto Regent Park
4 M6A North York Lawrence Heights

Group all neighborhoods with the same postal code

In [4]:
df = df.groupby(["PostalCode", "Borough"])["Neighborhood"].apply(", ".join).reset_index()
df.head()
Out[4]:
PostalCode Borough Neighborhood
0 M1B Scarborough Rouge, Malvern
1 M1C Scarborough Highland Creek, Rouge Hill, Port Union
2 M1E Scarborough Guildwood, Morningside, West Hill
3 M1G Scarborough Woburn
4 M1H Scarborough Cedarbrae
In [5]:
print("Shape: ", df.shape)
Shape:  (103, 3)

3. Get the latitude and the longitude coordinates of each neighborhood.

We are not able to get the geohraphical coordinates of the neighborhoods using the Geocoder package, we use the given csv file instead.

In [6]:
df_geo_coor = pd.read_csv("./Geospatial_Coordinates.csv")
df_geo_coor.head()
Out[6]:
Postal Code Latitude Longitude
0 M1B 43.806686 -79.194353
1 M1C 43.784535 -79.160497
2 M1E 43.763573 -79.188711
3 M1G 43.770992 -79.216917
4 M1H 43.773136 -79.239476

We need to couple 2 dataframes "df" and "df_geo_coor" into one dataframe.

In [7]:
df_toronto = pd.merge(df, df_geo_coor, how='left', left_on = 'PostalCode', right_on = 'Postal Code')
# remove the "Postal Code" column
df_toronto.drop("Postal Code", axis=1, inplace=True)
df_toronto.head()
Out[7]:
PostalCode Borough Neighborhood Latitude Longitude
0 M1B Scarborough Rouge, Malvern 43.806686 -79.194353
1 M1C Scarborough Highland Creek, Rouge Hill, Port Union 43.784535 -79.160497
2 M1E Scarborough Guildwood, Morningside, West Hill 43.763573 -79.188711
3 M1G Scarborough Woburn 43.770992 -79.216917
4 M1H Scarborough Cedarbrae 43.773136 -79.239476

4. Explore and cluster the neighborhoods in Toronto

4.1. Get the latitude and longitude values of Toronto.

In [8]:
address = "Toronto, ON"

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto city are {}, {}.'.format(latitude, longitude))
The geograpical coordinate of Toronto city are 43.653963, -79.387207.

4.2. Create a map of the whole Toronto City with neighborhoods superimposed on top.

In [9]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
map_toronto
Out[9]:

Add markers to the map.

In [10]:
for lat, lng, borough, neighborhood in zip(
        df_toronto['Latitude'], 
        df_toronto['Longitude'], 
        df_toronto['Borough'], 
        df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

map_toronto
Out[10]:

4.3. Map of a part of Toronto City

We are going to work with only the boroughs that contain the word "Toronto".

In [12]:
# "denc" = [D]owntown Toronto, [E]ast Toronto, [N]orth Toronto, [C]entral Toronto
df_toronto_denc = df_toronto[df_toronto['Borough'].str.contains("Toronto")].reset_index(drop=True)
df_toronto_denc.head()
Out[12]:
PostalCode Borough Neighborhood Latitude Longitude
0 M4E East Toronto The Beaches 43.676357 -79.293031
1 M4K East Toronto The Danforth West, Riverdale 43.679557 -79.352188
2 M4L East Toronto The Beaches West, India Bazaar 43.668999 -79.315572
3 M4M East Toronto Studio District 43.659526 -79.340923
4 M4N Central Toronto Lawrence Park 43.728020 -79.388790

Plot again the map and the markers for this region.

In [13]:
map_toronto_denc = folium.Map(location=[latitude, longitude], zoom_start=12)
for lat, lng, borough, neighborhood in zip(
        df_toronto_denc['Latitude'], 
        df_toronto_denc['Longitude'], 
        df_toronto_denc['Borough'], 
        df_toronto_denc['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_denc)  

map_toronto_denc
Out[13]:

4.4. Define Foursquare Credentials and Version

On the public repository on Github, I remove this field for the privacy!

In [14]:
CLIENT_ID = ''
CLIENT_SECRET = ''
VERSION = ''

4.5. Explore the first neighborhood in our data frame "df_toronto"

In [15]:
neighborhood_name = df_toronto_denc.loc[0, 'Neighborhood']
print(f"The first neighborhood's name is '{neighborhood_name}'.")
The first neighborhood's name is 'The Beaches'.

Get the neighborhood's latitude and longitude values.

In [16]:
neighborhood_latitude = df_toronto_denc.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_toronto_denc.loc[0, 'Longitude'] # neighborhood longitude value

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))
Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.

Now, let's get the top 100 venues that are in The Beaches within a radius of 500 meters.

In [17]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

# get the result to a json file
results = requests.get(url).json()

Function that extracts the category of the venue

In [18]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [19]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues
Out[19]:
name categories lat lng
0 The Big Carrot Natural Food Market Health Food Store 43.678879 -79.297734
1 Grover Pub and Grub Pub 43.679181 -79.297215
2 St-Denis Studios Inc. Music Venue 43.675031 -79.288022
3 Upper Beaches Neighborhood 43.680563 -79.292869

4.6. Explore neighborhoods in a part of Toronto City

We are working on the data frame df_toronto_denc. Recall that, this region contain DENC of Toronto where,

"DENC" = [D]owntown Toronto, [E]ast Toronto, [N]orth Toronto, [C]entral Toronto

First, let's create a function to repeat the same process to all the neighborhoods in DENC of Toronto.

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create a new dataframe called toronto_denc_venues

In [21]:
toronto_denc_venues = getNearbyVenues(names=df_toronto_denc['Neighborhood'],
                                   latitudes=df_toronto_denc['Latitude'],
                                   longitudes=df_toronto_denc['Longitude']
                                  )
In [22]:
toronto_denc_venues.head()
Out[22]:
Neighborhood Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
0 The Beaches 43.676357 -79.293031 The Big Carrot Natural Food Market 43.678879 -79.297734 Health Food Store
1 The Beaches 43.676357 -79.293031 Grover Pub and Grub 43.679181 -79.297215 Pub
2 The Beaches 43.676357 -79.293031 St-Denis Studios Inc. 43.675031 -79.288022 Music Venue
3 The Beaches 43.676357 -79.293031 Upper Beaches 43.680563 -79.292869 Neighborhood
4 The Danforth West, Riverdale 43.679557 -79.352188 Pantheon 43.677621 -79.351434 Greek Restaurant

Let's check how many venues were returned for each neighborhood.

In [23]:
toronto_denc_venues.groupby('Neighborhood').count()
Out[23]:
Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
Neighborhood
Adelaide, King, Richmond 100 100 100 100 100 100
Berczy Park 57 57 57 57 57 57
Brockton, Exhibition Place, Parkdale Village 19 19 19 19 19 19
Business Reply Mail Processing Centre 969 Eastern 17 17 17 17 17 17
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara 15 15 15 15 15 15
Cabbagetown, St. James Town 44 44 44 44 44 44
Central Bay Street 88 88 88 88 88 88
Chinatown, Grange Park, Kensington Market 100 100 100 100 100 100
Christie 16 16 16 16 16 16
Church and Wellesley 88 88 88 88 88 88
Commerce Court, Victoria Hotel 100 100 100 100 100 100
Davisville 32 32 32 32 32 32
Davisville North 7 7 7 7 7 7
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West 14 14 14 14 14 14
Design Exchange, Toronto Dominion Centre 100 100 100 100 100 100
Dovercourt Village, Dufferin 20 20 20 20 20 20
First Canadian Place, Underground city 100 100 100 100 100 100
Forest Hill North, Forest Hill West 4 4 4 4 4 4
Harbord, University of Toronto 34 34 34 34 34 34
Harbourfront East, Toronto Islands, Union Station 100 100 100 100 100 100
Harbourfront, Regent Park 46 46 46 46 46 46
High Park, The Junction South 23 23 23 23 23 23
Lawrence Park 4 4 4 4 4 4
Little Portugal, Trinity 63 63 63 63 63 63
Moore Park, Summerhill East 3 3 3 3 3 3
North Toronto West 21 21 21 21 21 21
Parkdale, Roncesvalles 15 15 15 15 15 15
Rosedale 4 4 4 4 4 4
Roselawn 1 1 1 1 1 1
Runnymede, Swansea 40 40 40 40 40 40
Ryerson, Garden District 100 100 100 100 100 100
St. James Town 100 100 100 100 100 100
Stn A PO Boxes 25 The Esplanade 95 95 95 95 95 95
Studio District 37 37 37 37 37 37
The Annex, North Midtown, Yorkville 24 24 24 24 24 24
The Beaches 4 4 4 4 4 4
The Beaches West, India Bazaar 22 22 22 22 22 22
The Danforth West, Riverdale 44 44 44 44 44 44

Let's find out how many unique categories can be curated from all the returned venues

In [24]:
print('There are {} uniques categories.'.format(len(toronto_denc_venues['Venue Category'].unique())))
There are 236 uniques categories.

4.7. Analyze Each Neighborhood

In [25]:
# one hot encoding
toronto_denc_onehot = pd.get_dummies(toronto_denc_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_denc_onehot['Neighborhood'] = toronto_denc_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_denc_onehot.columns[-1]] + list(toronto_denc_onehot.columns[:-1])
toronto_denc_onehot = toronto_denc_onehot[fixed_columns]

toronto_denc_onehot.head()
Out[25]:
Yoga Studio Adult Boutique Afghan Restaurant Airport Airport Food Court Airport Gate Airport Lounge Airport Service Airport Terminal American Restaurant ... Theme Restaurant Thrift / Vintage Store Toy / Game Store Trail Train Station Vegetarian / Vegan Restaurant Video Game Store Vietnamese Restaurant Wine Bar Wings Joint
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 236 columns

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [26]:
toronto_denc_grouped = toronto_denc_onehot.groupby('Neighborhood').mean().reset_index()
toronto_denc_grouped.head()
Out[26]:
Neighborhood Yoga Studio Adult Boutique Afghan Restaurant Airport Airport Food Court Airport Gate Airport Lounge Airport Service Airport Terminal ... Theme Restaurant Thrift / Vintage Store Toy / Game Store Trail Train Station Vegetarian / Vegan Restaurant Video Game Store Vietnamese Restaurant Wine Bar Wings Joint
0 Adelaide, King, Richmond 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.010000 0.0 0.0 0.01 0.0
1 Berczy Park 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.017544 0.0 0.0 0.00 0.0
2 Brockton, Exhibition Place, Parkdale Village 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.00 0.0
3 Business Reply Mail Processing Centre 969 Eastern 0.058824 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.00 0.0
4 CN Tower, Bathurst Quay, Island airport, Harbo... 0.000000 0.0 0.0 0.066667 0.066667 0.066667 0.133333 0.2 0.133333 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.00 0.0

5 rows × 236 columns

Check the 10 most common venues in each neighborhood.

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_denc_grouped['Neighborhood']

for ind in np.arange(toronto_denc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_denc_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()
Out[27]:
Neighborhood 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 Adelaide, King, Richmond Coffee Shop Café Thai Restaurant Steakhouse American Restaurant Burger Joint Hotel Restaurant Bakery Bar
1 Berczy Park Coffee Shop Cocktail Bar Café Cheese Shop Beer Bar Farmers Market Steakhouse Seafood Restaurant Restaurant Bakery
2 Brockton, Exhibition Place, Parkdale Village Café Breakfast Spot Coffee Shop Grocery Store Italian Restaurant Caribbean Restaurant Stadium Bar Furniture / Home Store Burrito Place
3 Business Reply Mail Processing Centre 969 Eastern Light Rail Station Garden Recording Studio Auto Workshop Skate Park Burrito Place Fast Food Restaurant Farmers Market Restaurant Yoga Studio
4 CN Tower, Bathurst Quay, Island airport, Harbo... Airport Service Airport Lounge Airport Terminal Harbor / Marina Boat or Ferry Sculpture Garden Plane Boutique Airport Gate Airport

4.8. Cluster neighborhoods

Run k-means to cluster the neighborhood into 5 clusters.

In [28]:
# set number of clusters
kclusters = 5

toronto_denc_grouped_clustering = toronto_denc_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_denc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
Out[28]:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [29]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_denc_merged = df_toronto_denc

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_denc_merged = toronto_denc_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_denc_merged.head() # check the last columns!
Out[29]:
PostalCode Borough Neighborhood Latitude Longitude Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 M4E East Toronto The Beaches 43.676357 -79.293031 1 Health Food Store Pub Music Venue Wings Joint Department Store Ethiopian Restaurant Electronics Store Eastern European Restaurant Dumpling Restaurant Donut Shop
1 M4K East Toronto The Danforth West, Riverdale 43.679557 -79.352188 1 Greek Restaurant Coffee Shop Ice Cream Shop Italian Restaurant Bookstore Furniture / Home Store Liquor Store Bakery Sports Bar Spa
2 M4L East Toronto The Beaches West, India Bazaar 43.668999 -79.315572 1 Park Sandwich Place Coffee Shop Food & Drink Shop Liquor Store Light Rail Station Burger Joint Burrito Place Fast Food Restaurant Fish & Chips Shop
3 M4M East Toronto Studio District 43.659526 -79.340923 1 Café Coffee Shop Bakery Italian Restaurant American Restaurant Bank Bar Fish Market Convenience Store Latin American Restaurant
4 M4N Central Toronto Lawrence Park 43.728020 -79.388790 1 Bus Line Park Construction & Landscaping Swim School Wings Joint Ethiopian Restaurant Electronics Store Eastern European Restaurant Dumpling Restaurant Donut Shop

Finally, let's visualize the resulting clusters

In [30]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(
        toronto_denc_merged['Latitude'], 
        toronto_denc_merged['Longitude'], 
        toronto_denc_merged['Neighborhood'], 
        toronto_denc_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters
Out[30]:

4.9. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster.

Cluster 1

In [31]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 0, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]
Out[31]:
Borough Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
8 Central Toronto 0 Playground Trail Tennis Court Wings Joint Dog Run Dessert Shop Dim Sum Restaurant Diner Discount Store Donut Shop

Cluster 2

In [32]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 1, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]
Out[32]:
Borough Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 East Toronto 1 Health Food Store Pub Music Venue Wings Joint Department Store Ethiopian Restaurant Electronics Store Eastern European Restaurant Dumpling Restaurant Donut Shop
1 East Toronto 1 Greek Restaurant Coffee Shop Ice Cream Shop Italian Restaurant Bookstore Furniture / Home Store Liquor Store Bakery Sports Bar Spa
2 East Toronto 1 Park Sandwich Place Coffee Shop Food & Drink Shop Liquor Store Light Rail Station Burger Joint Burrito Place Fast Food Restaurant Fish & Chips Shop
3 East Toronto 1 Café Coffee Shop Bakery Italian Restaurant American Restaurant Bank Bar Fish Market Convenience Store Latin American Restaurant
4 Central Toronto 1 Bus Line Park Construction & Landscaping Swim School Wings Joint Ethiopian Restaurant Electronics Store Eastern European Restaurant Dumpling Restaurant Donut Shop
5 Central Toronto 1 Park Gym Sandwich Place Clothing Store Breakfast Spot Food & Drink Shop Hotel Dog Run Diner Discount Store
6 Central Toronto 1 Clothing Store Sporting Goods Shop Coffee Shop Yoga Studio Gym / Fitness Center Furniture / Home Store Fast Food Restaurant Diner Metro Station Mexican Restaurant
7 Central Toronto 1 Pizza Place Dessert Shop Sandwich Place Italian Restaurant Café Coffee Shop Sushi Restaurant Deli / Bodega Dance Studio Indian Restaurant
9 Central Toronto 1 Coffee Shop Pub Light Rail Station Convenience Store Sushi Restaurant Bagel Shop Fried Chicken Joint Sports Bar American Restaurant Pizza Place
11 Downtown Toronto 1 Coffee Shop Restaurant Pub Bakery Market Italian Restaurant Pizza Place Café Butcher Jewelry Store
12 Downtown Toronto 1 Japanese Restaurant Coffee Shop Gay Bar Sushi Restaurant Restaurant Café Pub Nightclub Men's Store Mediterranean Restaurant
13 Downtown Toronto 1 Coffee Shop Park Café Pub Bakery Mexican Restaurant Breakfast Spot Theater Chocolate Shop Beer Store
14 Downtown Toronto 1 Coffee Shop Clothing Store Café Cosmetics Shop Middle Eastern Restaurant Fast Food Restaurant Japanese Restaurant Italian Restaurant Bubble Tea Shop Pizza Place
15 Downtown Toronto 1 Coffee Shop Hotel Café Restaurant Cosmetics Shop Breakfast Spot Bakery Gastropub Italian Restaurant Seafood Restaurant
16 Downtown Toronto 1 Coffee Shop Cocktail Bar Café Cheese Shop Beer Bar Farmers Market Steakhouse Seafood Restaurant Restaurant Bakery
17 Downtown Toronto 1 Coffee Shop Café Italian Restaurant Burger Joint Chinese Restaurant Japanese Restaurant Sandwich Place Bubble Tea Shop Bar Spa
18 Downtown Toronto 1 Coffee Shop Café Thai Restaurant Steakhouse American Restaurant Burger Joint Hotel Restaurant Bakery Bar
19 Downtown Toronto 1 Coffee Shop Aquarium Hotel Italian Restaurant Café Scenic Lookout Fried Chicken Joint Pizza Place Bakery Brewery
20 Downtown Toronto 1 Coffee Shop Café Hotel Restaurant American Restaurant Italian Restaurant Deli / Bodega Gastropub Seafood Restaurant Burger Joint
21 Downtown Toronto 1 Coffee Shop Café Restaurant Hotel American Restaurant Italian Restaurant Bakery Deli / Bodega Seafood Restaurant Gastropub
24 Central Toronto 1 Coffee Shop Sandwich Place Café Pizza Place Cosmetics Shop Liquor Store Burger Joint Jewish Restaurant Flower Shop BBQ Joint
25 Downtown Toronto 1 Café Bookstore Restaurant Bar Japanese Restaurant Bakery Nightclub Chinese Restaurant Beer Store Beer Bar
26 Downtown Toronto 1 Café Vegetarian / Vegan Restaurant Bar Dumpling Restaurant Coffee Shop Bakery Mexican Restaurant Chinese Restaurant Vietnamese Restaurant Cocktail Bar
27 Downtown Toronto 1 Airport Service Airport Lounge Airport Terminal Harbor / Marina Boat or Ferry Sculpture Garden Plane Boutique Airport Gate Airport
28 Downtown Toronto 1 Coffee Shop Restaurant Café Beer Bar Seafood Restaurant Hotel Cocktail Bar Cheese Shop Italian Restaurant Japanese Restaurant
29 Downtown Toronto 1 Coffee Shop Café Restaurant Hotel Deli / Bodega American Restaurant Steakhouse Burger Joint Bakery Gastropub
30 Downtown Toronto 1 Grocery Store Café Park Restaurant Nightclub Athletics & Sports Baby Store Diner Italian Restaurant Convenience Store
31 West Toronto 1 Bakery Supermarket Pharmacy Café Bank Middle Eastern Restaurant Brazilian Restaurant Bar Discount Store Pool
32 West Toronto 1 Bar Coffee Shop Asian Restaurant Café Restaurant Vietnamese Restaurant Bakery New American Restaurant Men's Store Cocktail Bar
33 West Toronto 1 Café Breakfast Spot Coffee Shop Grocery Store Italian Restaurant Caribbean Restaurant Stadium Bar Furniture / Home Store Burrito Place
34 West Toronto 1 Bar Mexican Restaurant Café Grocery Store Bakery Italian Restaurant Diner Speakeasy Fried Chicken Joint Arts & Crafts Store
35 West Toronto 1 Breakfast Spot Gift Shop Bookstore Bar Italian Restaurant Restaurant Dessert Shop Movie Theater Bank Dog Run
36 West Toronto 1 Café Pizza Place Coffee Shop Sushi Restaurant Italian Restaurant Gym Smoothie Shop Ice Cream Shop Indie Movie Theater Bar
37 East Toronto 1 Light Rail Station Garden Recording Studio Auto Workshop Skate Park Burrito Place Fast Food Restaurant Farmers Market Restaurant Yoga Studio

Cluster 3

In [33]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 2, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]
Out[33]:
Borough Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
22 Central Toronto 2 Garden Wings Joint Department Store Event Space Ethiopian Restaurant Electronics Store Eastern European Restaurant Dumpling Restaurant Donut Shop Doner Restaurant

Cluster 4

In [34]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 3, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]
Out[34]:
Borough Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
23 Central Toronto 3 Bus Line Trail Jewelry Store Sushi Restaurant Wings Joint Event Space Ethiopian Restaurant Electronics Store Eastern European Restaurant Dumpling Restaurant

Cluster 5

In [35]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 4, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]
Out[35]:
Borough Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
10 Downtown Toronto 4 Park Playground Trail Wings Joint Department Store Ethiopian Restaurant Electronics Store Eastern European Restaurant Dumpling Restaurant Donut Shop