Data Analytics Practice Opportunity
Organizing Party: Research Support and Digital Initiatives, CUHK Library
Introduction
Data Analytics Practice Opportunity organised by the Chinese University of Hong Kong Library, aims to:
- encourage reuse of data
- support the exploration of data and metadata from library collections
- promote data mining and visualization
- provide students with opportunities to develop data analytics projects with real data
- encourage inter-disciplinary exploration
Details
Participants will develop codes to analyse data and tell stories with the CUHK Library resources. If possible, the codes should be adaptive to other datasets in the future. The codes will be installed to our Digital Scholarship Tools platform (http://dstools.lib.cuhk.edu.hk) at the end of the Practice Opportunity. A project webpage and poster about the project will be created. Participants will also present the project at a sharing session.
Students participating in the Practice Opportunity will develop:
- codes
- project webpage
- project poster
- presentation of the project
Resources
- Data in the CUHK Digital Repository
- Metadata in the CUHK Library Archival Collections
- Data in the CUHK Research Data Repository
Sample data:
Name of collection/data | Sample research questions |
1. Biographical Relationship of Literati in Hong Kong Literature and Hong Kong Poets | The interpersonal relationship among literati in Hong Kong Literature with stylometry analysis |
2. The Hongkong News | Storytelling on Hong Kong between 1942 and 1945 using intertextuality detection algorithm |
3. 七十年代、 九十年代 | Tracing the patterns of social life from 1970s to 1990s in Hong Kong using topic modeling |
4. Cantonese Chanting in Hong Kong | Exploring the pitch-text relationship in poems using natural language processing |
5. 走馬樓三國吳簡.嘉禾吏民田家莂資料庫 | Reviewing the economic development in the Three Kingdoms of Eastern Han Dynasty using digital ethnography |
6. Metadata from the CUHK Library Archival Collections | Interaction between collections in the subject: Chinese literature – China – Hong Kong using network analysis |
7. The Chinese Student Weekly (中國學生周報) | OCR error recognition using N-gram and name-entity recognition |
8. Scientific data available on CUHK Research Data Repository | Exploration of characteristics in Millipede genomes |
9. Medicine- and health-related data available on CUHK Research Data Repository | Acceptance of vaccine and containment measures in Hong Kong during the COVID-19 period |
Sample project webpages
https://dsprojects.lib.cuhk.edu.hk/projects/#DA
Eligibility
All full-time undergraduate and postgraduate* students at CUHK
Number of members in each team: 1–3
Number of successful team to participate in the Practice Opportunity: 3
* Postgraduate students under Postgraduate Studentship have to seek approval from your own department when you are given an offer.
Selection criteria
- Knowledge on data analytics
- Sound skills and experience to develop codes
- Subject knowledge
- Ideas on reusing library resources to develop a project
Salary
Estimated HKD5000 per team
Application process
Complete the application form at https://cloud.itsc.cuhk.edu.hk/webform/view.php?id=13637956 (Library Job Application Reference: SH20211224RS) and upload the following documents:
- CV
- Copy of transcripts
- 100-word statement of purpose (for the team*)
*For team application, each team member has to submit the online application form separately. Please state under “supplementary information” the name(s) of your teammate.
Timeline
24 Dec 2021–17 Jan 2022 | Application Period |
21/24 Jan 2022 | Interview of candidates |
26 Jan 2022 | Announcement of successful applicants |
28 Jan 2022 | Briefing session |
February 2022–May 2022 | Development of projects |
May 2022 | Completion of projects, codes, website, and poster |
TBC | Presentation of projects |
Project Outcomes in Data Analytics Practice Opportunity 2021/22
Projects Completed:
- Text Analysis on Collection Exegesis of Recipes 《醫方集解》
- OCR & Data Analysis: The Hongkong News
- De Bruijn Graph in Genome Assembly with Millipede Genomes Dataset
Intellectual Property
According to the University policy, participants are required to assign to the University the intellectual property of the project outcome. Participants are authorised to use the original source files provided by the organiser only for the Data Analytics Practice Opportunity. They are not allowed to use them for other purposes without the authorisation of the organiser.
Participants must ensure that the codes developed and project output in the Data Analytics Practice Opportunity are original by the participants or legally authorized by the owner of the intellectual property rights. If any third party raises allegations of infringement of intellectual property rights or the legal irregularities, the participants assume all legal responsibility.
Enquiries
For any enquiries, please email the organizer at data@cuhk.edu.hk.