Data Analysis and Visualization
In this analysis, we explored several aspects of the poems, including:
1. Word Count
2. Sentiment Analysis
3. Linguistic Complexity
4. Correlation
5. Word Cloud
6. Author Information
1. Word count


We used a word count distribution and a box-and-whisker diagram to understand the word count pattern. It is found that most poems have around 100 words on average, but some are much longer. These longer poems appear as small circles outside the box in the diagram. They are considered outliers.
2. Sentiment Analysis


Next, we studied the sentiment of the poems. Each poem was given a sentiment score between 0 and 1, where 0 means extremely negative, and 1 means extremely positive
We found that most of the poems have scores between 0 and 0.2, suggesting that most poems express negative emotions. Besides that, sentiment analysis also revealed that most poems express extreme emotions—either positive or negative—with a few being neutral.
3. Lingustic Complexity

We measured how complex each poem is in terms of their use of words. A higher complexity score means the poem uses more difficult or uncommon words. We selected the top 20 most complex poems and showed them in a bar chart.
4. Correlation

So far, we have collected data on sentiment, word count, and lingustic complexity. Next, we wanted to see if these variables are related to each other.
We created a correlation matrix that visualize the correlations between the variables. The result shows that the correlations are very close to zero, meaning that there is no strong relationship between word count, sentiment, or lingustic complexity.
5. Word cloud

We also created a word cloud to see which words appear most often. After filtering out the common words, we noticed that words like 天空, 時間, and 世界 are very popular in the poems. These words help us understand the themes that many poems focus on.
6. Author Information

Finally, we looked at the authors. We found there are 387 unique authors in the dataset. Among them, 18 authors wrote at least 4 poems.

We created a bar chart to show how many poems each of these authors wrote. The top two are:
1. 魏鵬展先生 (the Editor) with 44 poems written
2. 水盈作者 with 22 poems written

We also calculated the average sentiment score for each author. For example:
1. 魏鵬展先生 has an average score of 0.26, meaning the tone is quite negative
2. 水盈作者 has a score of 0.49, which is more neutral