Guten Tag
Gerne lade ich euch diesen Mittwoch und Donnerstag, 24. und 25.
März 2021 zum virtuellen NLP Hackathon der Uni Bern ein. Auf
dieser Website und unten im Email sind die 4 spannenden Challenges
aufgeführt, die bis jetzt eingereicht wurden:
https://www.cnd.philnat.unibe.ch/ueber_uns/aktivitaeten/nlp_hackathon/
Der Ablauf des Hackathons ist wie folgt:
Kickoff am Mittwoch, 24. März 2021, 9:00 - 10:00 Uhr
- Begrüssung und Einführung
- Vorstellung der Challenges
- Team-Building
Präsentation der Resultate am Donnerstag, 25. März 2021, 15:00
- 16:00 Uhr
- Präsentation der Ergebnisse
- ab 16h virtuelles Abschlussbier
Meeting auf BigBlueButton: https://bbb.ch-open.ch/b/mat-f4n-qtn
Kommunikation per Slack: https://nlphackathon.slack.com
Aktuell sind rund 25 Personen angemeldet. Wer ebenfalls
teilnehmen möchte, kann sich per Email an dh@wbkolleg.unibe.ch
anmelden.
Danke auch fürs Weiterleiten dieser Nachricht an weitere
interessierte Personen!
Wir freuen uns auf spannende zwei Tage NLP-Hacking!
Herzliche Grüsse,
Matthias Stürmer
Challenges
Folgende vier Challenges sind aktuell eingereicht:
- Forschungsstelle Digitale
Nachhaltigkeit Uni Bern: Kompetitive
Challenge "Klassifikation von Schweizer Gerichtsurteilen"
The legal language is very special in many regards compared to
regular natural language. It is highly structured, rather complicated,
contains its own special terms and uses certain words
differently than they are used in regular text. Text
classification is simple to define but has a myriad of
possible applications and good systems can provide immense
value. Common general applications of text classification
include for example spam filtering, email priority rating, or
topic classification. And in the legal domain text
classification includes legal judgement prediction (predict
outcome of a case based on description of case's facts) or
legal area prediction. So in this challenge, you will
predict the chamber based on the text of a court decision. The
chamber is structured in the form of {federal
level}_{court}_{chamber number} (e.g. SG_KG_002 => St.
Gallen, Kantonsgericht, 002).
- Statistisches Amt Kanton Zürich: Kreative
Challenge "STATBOT.CH" (English
Documentation on GitHub)
If you are searching for some form of statistical information,
it is not always easy to find it in the shortest time possible.
Particularly in Switzerland, the data and information are not
only spread vertically over different federal levels. They are
also spread within these federal levels horizontally over
different offices and even there sometimes over different
sites/channels with different formats. Looking for the needle in
the haystack looks comparably easy next to that. Further, even
search engines are only of limited help, as they follow an
indexing logic that excludes information stored in databases or
files. The background of a more difficult search for facts, is
also a risk for democratic processes: The harder it is for the
average citizen to find truthful information, the easier it is
to spread fake news. Therefore, the Statistical Office of the
Canton of Zurich, together with other organizations, would like
to develop a Swiss Statistical Bot (STATBOT), which would
provide data and statistical information directly and quickly
across all organizations.
- Digital Humanities Uni Bern: Kreative
Challenge "NER for Historical Documents"
Developments towards NER solutions have shown significant
outcome in the past few years already. Nevertheless,
applications for sparse language data are still a challenge,
specially when dealing with data from pre-modern times. In this
challenge, we focus on language data from the 16th to the 18th
century from the Bernese Turmbücher (legal documents protocolled
in the Tower of Bern, Switzerland). These documents are
currently hosted in the State Archives of Bern. Language models
are not provided.
- Digital Humanities Uni
Bern: Visualization of Language Models
Language models (e.g. character embeddings) are essential to
succeed in NLP tasks. Especially when it comes to Part-of-Speech
and Named Entity Recognition, tasks result in more precise
models if supported by adequate language models already. Since
the advent of word2vec and large transformer-based language
models (such as BERT or GPT-3) a variety of specialized and
fine-tuned language models is currently available. Despite the
widespread use and the necessity when it comes to specific model
training (e.g. for language entities with only sparse data), our
understanding of the models themselves is limited at best. In
order to strengthen our understanding of language models and to
start the process of reflecting them, this challenge asks for
creative ways of visualizing language models. We envision
3D-visualizations based on dimension reduction to identify the
positioning of e.g. synonym/homonyms in vector spaces or listing
of semantic fields (neighboring vector values). For context
insensitive approaches (e.g. word2vec or GloVe) we imagine to
use the fixed vectors and represent calculations in grids.
__________________________________
Universität Bern
Institut für Informatik
Forschungsstelle Digitale Nachhaltigkeit
PD Dr. Matthias Stürmer
Leiter der Forschungsstelle Digitale Nachhaltigkeit,
Dozentur Digitale Transformation am INF und
Dozentur Digitale Nachhaltigkeit am IWI
Büro 204 (2. Stock)
Schützenmattstrasse 14
CH-3012 Bern
Telefon +41 31 631 38 09 (Direkt)
Telefon +41 31 631 47 71 (Sekretariat)
Mobile +41 76 368 81 65
matthias.stuermer@inf.unibe.ch
www.digitale-nachhaltigkeit.unibe.ch