Skip to main navigation Skip to search Skip to main content

Text-based entity matching for entity resolution and data fusion applied to person descriptions

Research output: Chapter in Book or Conference ProceedingsConference Proceedings with Oral Presentationpeer-review

Abstract

Text-based entity matching facilitates interoperability between heterogeneous systems by aligning textual person descriptions. We propose an entity matching methodology that integrates rule-based feature extraction, similarity measures, and supervised machine learning classifiers, rigorously evaluated on a person matching problem. We constructed a feature space by extracting domain-specific person attributes from text via a combination of string similarity scores and similarities of inverse document frequency (TF-IDF) embeddings. Next, we evaluated multiple supervised classification models including Multi-Layer Perceptron, Random Forest, and XGBoost, to determine their effectiveness. For evaluation, we created a new domain-specific entity matching dataset named Real Scenario Text-based Person Matching (RSTPM), and assessed the person matching performance of all models in terms of classification metrics and computational cost. In addition, we studied the classification impact of the various features. The proposed approach was shown to achieve an increase of 27.47 percentage points (from 55.41\% to 82.88\%) in F1-Score compared to the baseline and a total Accuracy of 92.14\%, thus demonstrating significant improvements in textual person matching whilst exhibiting a moderate increase in computational demand.
Original languageEnglish
Title of host publication2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
Pages7080 - 7085
ISBN (Electronic)979-8-3315-3358-8
Publication statusPublished - 28 Jan 2026
Event2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC) - Austria Center Vienna, Vienna, Austria
Duration: 5 Oct 20258 Oct 2025
https://www.ieeesmc2025.org/

Conference

Conference2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
Abbreviated titleIEEE SMC 2025
Country/TerritoryAustria
CityVienna
Period5/10/258/10/25
Internet address

Research Field

  • Responsive Sensing & Analytics

Fingerprint

Dive into the research topics of 'Text-based entity matching for entity resolution and data fusion applied to person descriptions'. Together they form a unique fingerprint.

Cite this