FinSim-3: An Overview
This FinSim-3 shared task wishes to appeal to communities specialized in Natural Language Processing (NLP), Machine Learning (ML), ArtificiaI Intelligence (AI), Knowledge Engineering and Financial document processing.
The mere representation of words is not sufficient and thus going beyond that is a crucial step for industrial applications using NLP.
In order to do so, industrial applications frequently use either:
- Unsupervised corpus-derived representations such as word embeddings, which can be fairly hazy for our understanding but remain incredibly valuable in NLP applications or;
- Manually tagged and elaborated resources such as corpora, lexica, taxonomies and ontologies, which have low coverage and occasional inconsistencies, but provide a deeper understanding of the target field.
These two approaches form the ends of a spectrum which many methods tried to combine, particularly with tasks aiming at expanding the coverage of manual resources using automatic methods.
- The Semeval community has organised several evaluation campaigns to stimulate the development of methods which extract semantic/lexical relations between concepts/words (Bordea et al. 2015, Bordea et al. 2016, Jurgens et al. 2016, Camacho-Collados et al. 2018).
- A significant amount of datasets and challenges specifically look at how to automatically populate knowledge bases such as DBpedia or Wikidata (e.g. KBP challenges).
- As far as we know, FinSim 2020 was the first-ever task attempting to combine these methods for the financial domain.
How is the FinSim-3 edition different from the previous tasks?
This edition of FinSim-3 chooses to focus on the evaluation of semantic representations. It does so by assessing the quality of the automatic classification of a given list of carefully selected terms from the financial domain against an ontology domain.
Participants will then be given a list of the selected terms from the financial field such as “European depositary receipt” or “Interest rate swaps” and will be asked to design a system which can automatically classify them into the most relevant hypernym concept in an external ontology.
E.g. If you are given the set of concepts “Bonds”, “Unclassified”, “Share” and “Loan”, the most relevant hypernym of “European depositary receipt” is “Share”.
This new edition offers an extended dataset with more diversified financial concepts. What will particularly spark our interest are systems which use relevant resources including ontologies and lexica creatively, as well as systems which use contextual word embeddings such as BERT (Devlin et al. 2018).
For each given term, participating systems are expected to provide the most relevant concept (hypernym/hyponym) in an external ontology: the Financial Industry Business Ontology (FIBO). Performance will be evaluated based on the accuracy with which financial terms have been classified and on recall, i.e. the total of predictions.
This task is open to everyone apart from the co-chairs of the organising team. The latter cannot submit a system, and will instead serve as an authority to solve any disputes regarding ethical issues or the entirety of system descriptions.
- Georgeta Bordea, Paul Buitelaar, Stefano Faralli and Roberto Navigli (2015). “SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval)”. In Proceedings of SemEval 2015, co-located with NAACL HLT 2015, Denver, Col, USA.
- Georgeta Bordea, Els Lefever, and Paul Buitelaar (2016). “Semeval-2016 task 13: Taxonomy extraction evaluation (TExEval-2)”. In Proceedings of the 10th International Workshop on Semantic Evaluation, San Diego, CA, USA.
- Jose Camacho-Collados, Claudio Delli Bovi, Luis Espinosa-Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, and Horacio Saggion (2018). “SemEval-2018 Task 9: Hypernym Discovery”. In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, United States. Association for Computational Linguistics.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. https://arxiv.org/abs/1810.04805v2.
- David Jurgens and Mohammad Taher Pilehvar (2016). “SemEval-2016 Task 14: Semantic Taxonomy Enrichment”. In Proceedings of SemEval-2016, NAACL-HLT.
- The Financial Industry Business Ontology (FIBO)
How to register
To register to the FinSim shared task, please use the following google form: https://forms.gle/a9cRtYX5X94QCoYYA.
A USD$1000 prize will be rewarded to the best-performing teams.
Submission paper: https://easychair.org/conferences/?conf=finnlp2021
- June 28 2021: Release of test set.
- June 28, 2021: Registration deadline.
- July 02, 2021: System’s outputs submission deadline.
- July 05, 2021: Release of results.
- July 07, 2021: Shared task title and abstract due.
- July 12, 2021: Shared task paper submissions due.
- July 16, 2021: Camera-ready version of shared task paper due.
If you have any questions regarding the shared-task, please contact us on email@example.com
Shared Task Co-organizers – Fortia Financial Solutions
- Dr Juyeon KANG
- Dr Ismail EL MAAROUF
- Sandra BELLATO
- Mei GAN