FinTech Firm Explores Named Entity Extraction

FinTech Firm Explores Named Entity Extraction

Founded in 2018, San Francisco-based Digits Financial combines machine learning and analytics to give businesses insights into their transactions, automatically identifying patterns, classifying data, and detecting anomalies in that data as each transaction is added to the database. Now, in a blog post, Hannes Hapke – a machine learning engineer at Digits – revealed how Digits uses natural language processing (NLP) to extract information for its clients and what they learned from developing their own model.

Digits leverages named entity recognition (NER) to extract information from unstructured text and turn it into categories like dates, identities, and locations. “We had seen outstanding results from NER implementations applied to other industries and we were eager to implement our own banking-related NER model,” Hapke wrote. “Rather than adopting a pre-trained NER model, we envisioned a model built with a minimal number of dependencies. That avenue would allow us to continuously update the model while remaining in control of ‘all moving parts.’”

In the end, Digits decided that no preexisting model would suffice, instead settling on building their own internal NER model based on TensorFlow 2.x and its accompanying ecosystem library, TensorFlow Text. They also conducted their own data annotation, using doccano to parse banking data into companies, URLs, locations, and more.

Hapke also explained Digits’ decision to go with Transformer architecture – specifically, the Bidirectional Encoder Representation from Transformers (BERT) architecture – for its initial NER model.

“Transformers provide a major improvement in NLP when it comes to language understanding,” he said. “Instead of evaluating a sentence token-by-token, the way recurrent networks would perform this task, transformers use an attention mechanism to evaluate the connections between the tokens.” Further, he explained, BERT could evaluate up to 512 tokens simultaneously.

After prototyping the model, they converted the model for production and began an initial deployment, optimizing the architecture for high throughput and low latency.

The resulting product provided, at its cores, a deceptively simple capability: allowing users to search their transaction records for vendors, websites, locations, and so forth. Digits has also expanded the model to include automatic insights and optimized it further for latency.

An example of how Digits’ model parses financial data into categories. Image courtesy of Digits.

“A more recent pre-trained model (e.g. BART or T5) could have provided higher model accuracy, but it would have also increased the model latency substantially,” Hapke said. “Since we are processing millions of transactions daily, it became clear that model latency is critical for us.”

Given its handling of financial data, Digits is sensitive to concerns over false positives and other errors. As a result, Hapke explained, Digits makes sure that it communicates which results were ML-predicted and allows users to easily overwrite suggestions.