Exploratory data analysis

Exploratory data analysis

Figure 1 User Gender Distribution

The chart shows a significant difference in the number of male and female users. This is a female-majority user base, which might influence the type of content that is more popular or the way information spreads on the platform.

This graph highlights the gender imbalance and its potential impact on content preferences and engagement. It discusses how this distribution might affect the spread of information, especially related to this topic, which might resonate differently with different genders.

Figure 2 Verified User Distribution

The platform has a large number of regular users. It emphasizes the dominance of unverified users and the potential implications for content credibility and trustworthiness.

Discuss how the low percentage of verified users might affect the spread of misinformation.

Figure 3 Post Category Distribution


The “other” category has a significantly higher number of posts. Despite the lower count, due to the large sample size, the “COVID-19” category is still substantial, indicating that the pandemic is a major topic of discussion on the platform.

Figure 4 Posts by Hour

There are two distinct peaks. Early morning (0-1 AM) and Evening (8-11 PM). key times for content sharing and engagement. Positive posts peak during the daytime hours (e.g., 10 a.m. to 15 p.m.) with a peak of nearly 5,000 posts. At night (0 to 5 PM), the number of posts drops significantly, with a minimum of less than 1,000 posts. User activity cycle: High interaction during the day may be related to fragmented time, such as work breaks and lunch breaks, reflecting the time dependence of users’ emotional expression. The “golden window” for emotional transmission: Positive content spreads more easily during peak hours, which can provide time-sensitive strategies for anti-rumor campaigns – such as pushing scientific information during active hours to hedge against negative emotions.