Into the Wild - Avoiding Pitfalls in the Evaluation of Travel Activity Classifiers

Peter Widhalm, Maximilian Leodolter, Norbert Brändle

Research output: Chapter in Book or Conference ProceedingsBook chapter

Abstract

Most submissions to the 2018 Sussex-Huawei Locomotion-Transportation (SHL) recognition challenge strongly overestimated the performance of their algorithms in relation to their performance achieved on the challenge evaluation data. Similarly, recent studies on smartphone based trip data collection promise accurate and detailed recognition of various modes of transportation, but it appears that in field tests the available techniques cannot live up to the expectations. In this chapter we experimentally demonstrate potential sources of upward scoring bias in the evaluation of travel activity classifiers. Our results show that (1) performance measures such as accuracy and the average class-wise F1 score are sensitive to class prevalence which can vary strongly across sub-populations, (2) cross-validation with random train/test splits or large number of folds can easily introduce dependencies between training and test data and are therefore not suitable to reveal overfitting, and (3) splitting the data into disjoint subsets for training and test does not always allow to discover model overfitting caused by lack of variation in the data.
Original languageEnglish
Title of host publicationHuman Activity Sensing. Corpus and Applications
PublisherSpringer
Pages197-211
Number of pages15
ISBN (Print)978-3-030-13000-8
DOIs
Publication statusPublished - 2019

Research Field

  • Former Research Field - Mobility Systems

Fingerprint

Dive into the research topics of 'Into the Wild - Avoiding Pitfalls in the Evaluation of Travel Activity Classifiers'. Together they form a unique fingerprint.

Cite this