Biomarker Estimation

From the correlation matrix, significant negative correlations can be observed from BMI, MVPA, and steps with both the p16 and p21 biomarkers. There exists high intragroup correlation in each of the cellular, anthropometric, and behavioural groups, which may cause multicollinearity if used together in forming regression models. Additionally, chronological age appears to be uncorrelated with any of the senescence biomarkers, hinting that behavioural and anthropometric factors play a larger role in determining biological aging.


Similar to the correlation analysis, we observe BMI, MVPA, steps, and waist size to be significant predictors of both p16 and p21 at a 0.05 level. When viewing the remaining cellular variables, TNF-α appears to be highly effective as a predictor for p16, while the remaining cellular variables are more effective in predicting p16 than p21. A similar pattern can be observed when viewing all variable groups, with the behavioural variables faring better when estimating p16, while the anthropometric variables are more effective in predicting p21. While the data remains too sparse to draw any conclusions, it is a noteworthy observation for future exploration.



Through the trial of different variable combinations, ultimately the best performing ones rely on a mix of physical activity, waist measurement, weight classification, and blood pressure. To simplify the estimation process for individuals conducting this experiment at home, we chose to use both blood pressure measurements, waist, weight, and MVPA as our estimation model for both p16 and p21. Even with the limited sample size, the interpretability of this model is surprisingly effective, with adjusted R-squared scores of around 0.4, and most importantly, a highly insignificant age variable for both biomarkers. Following the results of the paper, MVPA demonstrates a significant inverse relationship with both biomarkers.
By following this process, we believe it is possible to further develop in this area to have an accurate model for individuals to predict their own biological aging level in the comfort of their own homes.
For KNN clustering, we first obtained the best K for different sets of features using the silhouette scores.

For example, the “MVPA” bar indicates that when we attempt to cluster our samples based on their p16 and MVPA features only, the best k we can choose is 9.
We tried seeing if any of the results led to some trends or patterns that distinguish the clusters. Some findings are shown below.
All features & p16 clusters

Observations:
- MVPA has a significant impact on the p16 biomarker compared to other controllable variables.
- The youngest cluster group [average age: 22] is the unhealthiest group in our dataset.
MVPA & p16 clusters

Observations:
- There is a difference in sleep duration between clusters 0&3 with cluster 3 having higher p16 value despite higher MVPA [highlighted in green].
- Waist differentiates the 1 & 4 clusters with cluster 1 having higher p16 value despite higher MVPA [highlighted in red].
Steps & p16 clusters

Observations:
- Sleep duration and blood pressure are related to each other which might indicate stressful lifestyle.
- Stressful lifestyle may lead to an increase in p16 biomarker.
Anomaly Detection
Our pipeline yielded the following results:

Surprisingly, linear regression achieved the highest R² score among all candidate models. Therefore, it was selected as the final model for p16 biomarker prediction.
To visualise which features were most consistently selected across model iterations, we created a word cloud.

Waist, TNF-a, MVPA, and BMI were the features most frequently chosen as the best predictors of p16 and were therefore included in the final model.
The interactive platform is available on a separate page of this project. The user needs to input their true p16 value, waist, TNF-a, BMI, and MVPA levels (the units of measurement are also indicated in the input boxes).
The result shall show the linear regression model fit with the residual highlighted in red colour. The message of anomaly status must also be visible. Below is an example of what the result might look like.

Aging Intervention

With the 12-week MVPA intervention programme, the physical activity behaviour of the participants altered drastically, with MVPA showcasing the most drastic change. While VPA appears to have a higher increase after standardisation compared to MVPA, conclusions should not be drawn from VPA because of the near-zero values from all participants before the intervention.

Due to the limitations of the dataset, the data collected from participants after the intervention period are not matched to their initial profiles, and individual changes in physical activity are also not included in the post-intervention dataset. However, we can use the MVPA intervention group as a dummy variable to substitute for the effects of increased physical activity. For body-conscious individuals, the increased MVPA results in a significant inverse correlation with change in waist and fat percentage. To better visualise the individual changes from this trial, we can utilise box-and-whisker plots:




Of all variables, p16, p21, VO2 max, and waist measurements showcased a significant change between the control and intervention groups. This is encouraging for anyone on either a weight loss journey or hoping to reverse the effects of biological aging, and further demonstrates how lifestyle plays a large role in our overall health and wellbeing.

When conducting individual regression analysis with the variables, using both the standalone change and including the interaction with the intervention-group dummy variable, we observe that the change in waist size can best explain the change in the p16 biomarker, but none of the variables are able to account for the change in the p21 biomarker. Therefore, we can conclude that for obese individuals trying to conduct their own biological age reversal process, waist measurements can act as a placeholder to measure their progress as they increase their own physical activity.
Again, with the severe limitations in this dataset, the results should serve as a reference only, and future studies with more comprehensive data can deeply explore the intricacies of this topic.
To see the full version of our code, please follow this link https://github.com/ExtraAsseto/DAPO_lambda.