Finding the Hidden Gems in CUHK Library’s Audio Collection: Machine Learning as a Tool for Audio Analysis
Chanting is a Chinese traditional practice of reading, composing and teaching classical poetry and prose in a specific melodic style with variations in different dialects, lineages or personal preferences. The Chinese University of Hong Kong Library (CUHK Library) has archival audio stock deposited and donated from various scholars and a large portion were recorded lecture sessions on teaching intermingled with Cantonese chanting having enormous research value. CUHK Library has been working on these materials to digital online collections for preservation and open access, e.g., Rulan Chao Pian Collection in CUHK Digital Repository. However, these online recordings have lack of detail on the content and breakdown for non-ethnic music in their records due to the diverse subject analysis approaches. Consequently, researchers are required to invest considerable time and effort in manually sorting out the chanting activities from these hour-long recordings. Therefore, CUHK Library initiated a pilot project aimed at developing a machine learning based classifier as a rapid tool to identify speech and chanting activities from the digital audio repository stock through automated analysis.
In this project an open-source GitHub project for audio analysis was adopted for segmentation, classifier training and prediction. It contains a Python library that supervises machine learning models such as Support Vector Machine (SVM) for classification. This approach involves extracting audio features, performing statistical analysis to identify differences and storing the information in a feature vector. This feature vector can then be utilized to predict the new data based on patterns observed in categorized dataset.
The training data being applied was mainly from
- “Archive of the 20th Century Cantonese Chanting in Hong Kong”( 「二十世紀香港粵語吟誦典藏」)
- Recordings of “Chinese Poetry Recitation” (「露港秋唱」) held in 2018, 2019 and 2021
- Other online resources on Cantonese chanting
These audio resources were processed to reduce noise and normalize volume in enhancing the training quality. They were then segmented and categorized into “Chanting”, “Speech” and “Silence” correspondent to their content.