Results

We found out many interesting characteristics of the department networks. These network features revealed intuitive regularities underlying the theses. We presented them in the order of pooled theses across all years (corresponding to “aggregated graph”) and two important subsets of data, i.e., the PhD theses and Masters theses. Finally, we discuss our findings regarding the connections of graduate theses across faculty level partition and the academic area partition.

Aggregated graph

We utilize the complete corpus of the electronic theses database to create an aggregated graph. In this graph, each node corresponds to a department or its equivalent. The connections, or ties, between department A and department B signify the presence of shared topics among their respective theses. These ties are assigned weights, ranging from one to fifteen, indicating the number of common topics shared between the departments.

Figure 1. Aggregated network.
Notes: We only presented ties with 2 or more shared topics.

Overall, this is a very dense and compact network. Among 56 departments, around 74% of all possible ties are realized. Besides, any department can reach any other department in two steps.

The distribution of tie weights is uneven, as depicted in Figure 2. The figure illustrates a decreasing number of ties as the weights increase, indicating that only a handful of departments share a significant number of topics. Notably, there are no pairs of departments that share eight or more common topics.

Figure 2. The distribution of ties.
Notes: We presented 8 figures here. The first one shows all the ties with exactly one shared topic, and the eighth one shows all the ties with exactly eight shared topics.

Comparison between Masters Theses and PhD Theses

Figure 3 displays the network representation of Master theses, while Figure 4 represents the network for PhD theses. The Master graph exhibits a higher density, indicating a greater number of connections between nodes. Additionally, compared to the PhD graph, the Master graph shows less specialization in terms of the topics covered.

Type	No. nodes	No. ties	Density	Girvan-Newman modularity	Linkrank modularity
Master	51	897	0.70	0.15	0.14
PhD	55	853	0.57	0.22	0.19

Table 1. Network features of the Master and PhD networks.

Figure 3. Master theses. Notes: We only presented ties with two or more shared topics.

Figure 4. PhD theses. Notes: We only presented ties with two or more shared topics.

We interpreted the specialization in terms of the topics covered based on the Girvan-Newman (GN) modularity and Linkrand modularity. These two things are measures of the network structure. In our case with the algorithm detected clusters of departments, they are used to measure the degree of connectedness of the nodes within the same clusters relative to the connectedness of the nodes across clusters. In other words, a higher level of these indicators reflects lower integration across departments in terms of the research topics. In particular, in our case, the main difference between the two measures is that the GN modularity captures one-step connection while the Linkrank modularity accounts for multi-step connections.

Comparison between the Categorizations by Academic Areas and by Algorithm

Academic areas

Figure 5 displays a color-coded representation of all 56 departments based on their academic areas. Departments within the STEM field are depicted in blue, social science departments are represented in grey, and departments within the humanities are shown in violet.

We then employed the Louvain community detection algorithm to partition the departments into three clusters. By manipulating the resolution limit, we controlled the number of clusters. Departments that are clustered together are assigned the same color. As depicted in Figure 5, the social science and humanity departments are grouped together, while the STEM field is divided into two communities. For simplicity, we refer to them as STEM 1 (represented by the violet color) and STEM 2 (represented by the blue color). STEM 1 consists of departments related to medicine and life science, while STEM 2 comprises departments with different affiliations.

Figure 6. Algorithm detected three communities.

We arranged and ordered the nodes consistently in Figure 5 and 6 to enable direct comparison. The discrepancies in the coloring scheme reflect the differences between the academic area partition (humanity, social science, and STEM) and the network-based communities (humanity and social sciences, STEM 1, and STEM 2). The adjusted mutual information (AMI) between the two partition schemes is 0.58, indicating a moderate level of agreement.

Faculty-level categorization

Figure 7 illustrates how the theses are categorized according to their faculty. Departments belonging to the same faculty are assigned the same color for easy identification.

Figure 7. Eight faculty-level institutions in CUHK.

The objective is to once again fix the position of the nodes and observe the discrepancies between the faculty-level partitions and the network based eight communities. The Adjusted Mutual Information (AMI) between these two partition schemes is 0.54.

Figure 8. Algorithm detected eight communities.

Conclusion

Using the title and the meta data of graduate theses in CUHK from 1967 to 2021, we constructed a series of department networks. The nodes are departments, and the ties that connect different departments are the shared topics of their students’ theses. From the analysis on these networks, we learned more about the connections across the departments in terms of one of the very key components of a research university — the content of the research done by graduate students.

The total number of bottom-up constructed broad topics are only 15, not a large number. However, it is still a bit surprising how dense the aggregated network is. Among 56 departments, around 74% of all possible ties are realized. Besides, any department can reach any other department in two steps.

Our comparison between Masters theses and PhD theses reflect something about how the academic system works. Specifically, the network that is based on PhD theses has much lower density and somewhat higher level of modularity. This can be understood as a signal that PhD theses are much more specialized and concerning less distinctive topics relative to Masters theses.

Finally, we compared the algorithm detected partition of departments and a commonsensical partition based on academic areas show that. One the one hand, humanity disciplines share a lot of research topics with social science disciplines, regardless of the difference in research approaches. One the other hand, the theses under STEM subjects show a relatively clear boundary between the life science related research topics and other research topics. We also compared the algorithm detected partition of departments and the faculty level partitions, we found that the agreement is relatively low, indicating the underlying substantive connections across the faculty boundaries.