Interactive Map

Interactive Map

Data Preparation

Two important Excel sheets are necessary before generating the interactive map. The first sheet provides the geographic coordinates (latitude and longitude) for various locations, along with their corresponding place types. This sheet will be used to map these locations within ArcGIS. The second sheet comprises quotes extracted from the 60 interviews, focusing on interviewee statements that reference specific place names at CUHK. This sheet contains the interviewee’s name, the places they mentioned, the quote itself, the video URL, and the source information.

An example of the first sheet (Location Information):

An example of the second sheet (Quotes):

A key question remained: how to extract place names from the transcripts? To address this, I used Python to identify locations and their related quotes. Initially, I employed the pycantonese library for word segmentation. Subsequently, I checked if any of the resulting words ended with a suffix commonly associated with Cantonese place names. (My list of location suffixes included: ‘樓’, ‘書院’, ‘學院’, ‘餐廳’, ‘館’, ‘研究所’,’宿’,’街’,’堂’,’校’,’大學’,’hall’,’站’,’地方’,’嗰度’.) Sentences containing words with these suffixes were then added to a list of location-related sentences.

The example below demonstrates the output of this code: a sentence such as ‘嗰時候中文大學仲未成立,但係有個聯合招生廣告,我就參加咗呢個聯招’ can be extracted because it contains the keyword ‘大學’. In this way, we can obtain the necessary quotes and places names (e.g. 中文大學) for preparing the Excel sheets.

!pip install python-docx
!pip install pycantonese

from google.colab import drive
drive.mount('/content/drive')

import docx
import pycantonese
import os
import re

# 1. Text segmentation
def segment_text(text):
    sentences = re.split(r'(。|!|\!|\.|?|\?)', text)
    sentences = [sentence.strip() for sentence in sentences if sentence.strip()]
    return sentences

# 2. Extract location sentences
def extract_locations_sentences(sentences):
    location_suffixes = ['樓', '書院', '學院', '餐廳', '館', '研究所','宿','街','堂','校','大學','hall','站','地方','嗰度']
    location_sentences = []
    for sentence in sentences:
        words = pycantonese.segment(sentence)
        if any(word.endswith(suffix) for word in words for suffix in location_suffixes):
            location_sentences.append(sentence)
    return location_sentences

# 3. Get the text from doc
def toText(file_path):
    doc = docx.Document(file_path)
    full_text = []
    for para in doc.paragraphs:
        full_text.append(para.text)
    print(full_text)
    return '\n'.join(full_text)

# 4. Output
file_path = '/content/13 湯保歸 - 崇基的幽谷清音.docx'
text = toText(file_path)
sentences = segment_text(text)
sentences_with_location = extract_locations_sentences(sentences)
print('Length: ', len(sentences_with_location))

print("Extracted Locations: ")
print(sentences_with_location)

In total, this project identified 81 place names and 519 quotes.

Mapping

Finally, I used ArcGIS Online to create an interactive map. I then used the ‘Join’ and ‘Relate’ functions in ArcGIS to integrate the location data with the quote data.

Moreover, the map includes different layers for New Asia College, Chung Chi College, United College, and the Chinese University of Hong Kong. Additional layers categorize individuals by their affiliation to CUHK: those enrolled or working before 1963, between 1963 and 1976, and between 1976 and 1980.

Interactive Map

Produced using ArcGIS Online