The Urban Big Data Knowledge Lab aims to promote knowledge about and development of urban big data policies. To this purpose, it yearly awards a prize for the best thesis on the subject by Master students from Erasmus University Rotterdam and Bachelor Students from Rotterdam University of Applied Sciences. The winner receives €2.000 and a published interview about his or her thesis.
Based on criteria such as originality, practical relevance and methodological rigor the thesis by Cas de Weerd, on the prediction of youth help usage based on pre-usage risk factors, was selected as the winner of our 2018 Thesis Award. Congratulations, Cas!
You can find the full thesis here
. The abstract of the thesis can be found below.
Municipalities in the Netherlands are responsible for the provision of youth
help services. Youth help services are support services to youngsters and
families with child or parenting problems. The budget for youth help services
has been declining, while the number of youth help trajectories is growing. This
stresses the importance for municipalities to accurately predict future youth
help demand. An accurate prediction of children and families who are expected
to be entering a youth help trajectory in the future could improve prevention
policies and the allocation of resources. In the past, research to youth help
usage was aimed at identifying risk factors. However, these models could also
be used for prediction. Moreover, these studies were mainly based on survey
data, while linked-administrative data seems more suitable for the prediction of
youth help usage. In this study a predictive risk modeling approach is applied
to linked-administrative data, in order to predict first-time youth help usage in
the following year. Six classification techniques are applied. The best model of
these six is compared with a prior model of the city of Rotterdam. The dataset
includes fifteen risk factors, derived from thirteen databases from the Central
Bureau of Statistics, containing information of youngsters until the age of 23,
who lived in Rotterdam between 2015 and 2017 (n=166.654). Results showed
that the best performing model was a random forest model, trained on
balanced training data. When compared to a previous model, the increase in
predictive performance is: 51.2% in sensitivity, 36.7% in precision and 0.427 in
F1-score at a risk threshold of 0,5. Besides, in terms of AUC, model
performance increases from 0.62 to 0.80. In this model a combination of life
events, child, parental and household factors are the most important predictors.
Moreover, when previous youth help usage was added to the model the
performance increases to a sensitivity of 76.2%, a precision of 58.7% and a
F1-score of 0.66, at a risk threshold of 0,5. Moreover, the AUC increases to 0.91.
However, this model does not predict the first-time usage, but total demand for
youth help services in the following year. This study showed that predictive
risk modeling applied to administrative data improves the prediction of
first-time youth help usage in the following year. In theory this model could be
used for individual prevention measures However, as the model performance
is improved, the precision is still low. In future research this model could be
improved by extending the number of risk factors and improving data quality.