Algorithms Comparison
In this section, we try to perform clustering and try to find user types in Twitter.
We start by performing different algorithms. We tested three clustering algorithms: K-means, CLARA, and DBSCAN. We chose K-means because it was the most stable (ARI = 0.94) and reproducible method, produced six interpretable and reasonably sized personas, and fit our goal of actionable user segmentation better than DBSCAN’s highly uneven density clusters or CLARA’s weaker stability.
| Criterion | K-Means | CLARA | DBSCAN | Illustration |
| Stability | 0.946 | 0.664 | 0.548 | Persona system must be reproducible across resamples |
| Cluster Structure | 6 interpretable clusters | 6 clusters, but less distinct | 2 clusters + noise; highly uneven | We need usable, explainable segments |
| Cluster Balance | Reasonably distributed | Moderate | One huge cluster + tiny niche cluster | Very uneven clusters are hard to operationalize |
Clustering Results
We found that users are not a single homogeneous population. Instead, they fall into six stable and interpretable personas with different diffusion mechanisms and scale profiles. These personas play different roles in the diffusion process: some are better for immediate amplification(A), some are better bridge-like candidates(B), and some are more associated with deep cascade potential(C). This means audience selection should not rely on follower size alone; it should be based on the combination of network mechanism, exposure pattern, and strategic campaign goal.

Findings
- Users are not homogeneous; they form six meaningful personas
- Most users are in broad middle segments, not extreme influencer groups.
- A (immediate amplification) :K-means Cluster 1
- B (bridge / breakout proxy) :K-means Cluster 4
- C (deep cascade tail) :K-means Cluster 1
Persona Analysis based on scale features

We include this graph to show that the clusters differ not only in diffusion mechanism, but also in user scale and activity.
This graph shows the scale profile of each user cluster. It helps us understand whether the differences across personas are driven by account size and activity, not only by diffusion mechanism. We find that some clusters are clearly high-scale and highly active, while others are low-scale or middle-layer groups, which means user clustering captures meaningful differences in both mechanism and scale.