Findings

Biomarker Estimation

Figure 1. Correlation Matrix of All Variables.

From the correlation matrix, significant negative correlations can be observed from BMI, MVPA, and steps with both the p16 and p21 biomarkers. There exists high intragroup correlation in each of the cellular, anthropometric, and behavioural groups, which may cause multicollinearity if used together in forming regression models. Additionally, chronological age appears to be uncorrelated with any of the senescence biomarkers, hinting that behavioural and anthropometric factors play a larger role in determining biological aging. 

Figure 2. Individual Anthropometric and Behavioural Variable Significance in Predicting p16 and p21.
Figure 3. Individual Cellular Variable Significance in Predicting p16 and p21.

Similar to the correlation analysis, we observe BMI, MVPA, steps, and waist size to be significant predictors of both p16 and p21 at a 0.05 level. When viewing the remaining cellular variables, TNF-α appears to be highly effective as a predictor for p16, while the remaining cellular variables are more effective in predicting p16 than p21. A similar pattern can be observed when viewing all variable groups, with the behavioural variables faring better when estimating p16, while the anthropometric variables are more effective in predicting p21. While the data remains too sparse to draw any conclusions, it is a noteworthy observation for future exploration. 

Figure 4. Top Anthropometric and Behavioural Variable Combinations in Predicting p16.
Figure 5. Top Anthropometric and Behavioural Variable Combinations in Predicting p21.
Table 1. Regression Results for Final Prediction Model for p16 and p21.

Through the trial of different variable combinations, ultimately the best performing ones rely on a mix of physical activity, waist measurement, weight classification, and blood pressure. To simplify the estimation process for individuals conducting this experiment at home, we chose to use both blood pressure measurements, waist, weight, and MVPA as our estimation model for both p16 and p21. Even with the limited sample size, the interpretability of this model is surprisingly effective, with adjusted R-squared scores of around 0.4, and most importantly, a highly insignificant age variable for both biomarkers. Following the results of the paper, MVPA demonstrates a significant inverse relationship with both biomarkers. 

By following this process, we believe it is possible to further develop in this area to have an accurate model for individuals to predict their own biological aging level in the comfort of their own homes. 

For KNN clustering, we first obtained the best K for different sets of features using the silhouette scores.

Figure 7. Optimal Number of Clusters for a Given Set of Features.

For example, the “MVPA” bar indicates that when we attempt to cluster our samples based on their p16 and MVPA features only, the best k we can choose is 9.

We tried seeing if any of the results led to some trends or patterns that distinguish the clusters. Some findings are shown below.

All features & p16 clusters

Table 2. Mean Values of Different Features for Clusters Derived from p16 and Other Variables.  

Observations:

  • MVPA has a significant impact on the p16 biomarker compared to other controllable variables.
  • The youngest cluster group [average age: 22] is the unhealthiest group in our dataset.

MVPA & p16 clusters

Table 3. Mean Values of Different Features for Clusters Derived from p16 and Moderate to Vigorous Physical Activity.

Observations:

  • There is a difference in sleep duration between clusters 0&3 with cluster 3 having higher p16 value despite higher MVPA [highlighted in green].
  • Waist differentiates the 1 & 4 clusters with cluster 1 having higher p16 value despite higher MVPA [highlighted in red].

Steps & p16 clusters

Table 4. Mean Values of Different Features for Clusters Derived from p16 and Steps.

Observations:

  • Sleep duration and blood pressure are related to each other which might indicate stressful lifestyle.
  • Stressful lifestyle may lead to an increase in p16 biomarker.

Anomaly Detection

Our pipeline yielded the following results:

Figure 8. R² Scores across the Regression Models for p16 Prediction. 

Surprisingly, linear regression achieved the highest R² score among all candidate models. Therefore, it was selected as the final model for p16 biomarker prediction. 

To visualise which features were most consistently selected across model iterations, we created a word cloud.

Figure 9. Word Cloud of Variables Selected by the Pipeline. 

Waist, TNF-a, MVPA, and BMI were the features most frequently chosen as the best predictors of p16 and were therefore included in the final model. 

The interactive platform is available on a separate page of this project. The user needs to input their true p16 value, waist, TNF-a, BMI, and MVPA levels (the units of measurement are also indicated in the input boxes).  

The result shall show the linear regression model fit with the residual highlighted in red colour. The message of anomaly status must also be visible. Below is an example of what the result might look like.  

Figure 10. Example of Anomaly Detection on the Interactive Platform.

Aging Intervention

Figure 11. Change in Behavioural Variables for Control and MVPA Intervention Groups.

With the 12-week MVPA intervention programme, the physical activity behaviour of the participants altered drastically, with MVPA showcasing the most drastic change. While VPA appears to have a higher increase after standardisation compared to MVPA, conclusions should not be drawn from VPA because of the near-zero values from all participants before the intervention.  

Figure 12. Correlation Matrix for all Delta Variables.

Due to the limitations of the dataset, the data collected from participants after the intervention period are not matched to their initial profiles, and individual changes in physical activity are also not included in the post-intervention dataset. However, we can use the MVPA intervention group as a dummy variable to substitute for the effects of increased physical activity. For body-conscious individuals, the increased MVPA results in a significant inverse correlation with change in waist and fat percentage. To better visualise the individual changes from this trial, we can utilise box-and-whisker plots: 

Figure 13. Difference in Delta p21 Between Control and Intervention Groups.
Figure 15. Difference in Delta VO2Max Between Control and Intervention Groups.
Figure 14. Difference in Delta p16 Between Control and Intervention Groups.
Figure 16. Difference in Delta Waist Between Control and Intervention Groups.

Of all variables, p16, p21, VO2 max, and waist measurements showcased a significant change between the control and intervention groups. This is encouraging for anyone on either a weight loss journey or hoping to reverse the effects of biological aging, and further demonstrates how lifestyle plays a large role in our overall health and wellbeing. 

Figure 17. Individual Anthropometric Variable Significance in Predicting Delta p16 and Delta p21.

When conducting individual regression analysis with the variables, using both the standalone change and including the interaction with the intervention-group dummy variable, we observe that the change in waist size can best explain the change in the p16 biomarker, but none of the variables are able to account for the change in the p21 biomarker. Therefore, we can conclude that for obese individuals trying to conduct their own biological age reversal process, waist measurements can act as a placeholder to measure their progress as they increase their own physical activity. 

Again, with the severe limitations in this dataset, the results should serve as a reference only, and future studies with more comprehensive data can deeply explore the intricacies of this topic. 

To see the full version of our code, please follow this link https://github.com/ExtraAsseto/DAPO_lambda.