
Mehwish Fatima
Graduate student @ Heidelberg Institute for Theoretical Studies
simplification
summarization
cross-lingual science journalism
2
presentations
2
number of views
SHORT BIO
I am working as a Guest NLP Scientist at Heidelberg Institute for Theoretical Studies (HITS), Germany, under Prof. Dr. Michael Strube, and enrolled as a Ph.D. scholar of Computational Linguistics at Universität Heidelberg. I am working on an R&D-based industry project for SPEKTRUM der Wissenschaft. The working title of the project/thesis is "Single Document Cross-lingual Abstractive Summarization for Scientific Texts".
During my Ph.D. project, I focused on:
1- Data Collection and Analysis
- from online resources with various Python libraries such as Wiki Api, Beautiful Soup, Tika, NLTK and Pandas.
- verification and analysis of curated datasets with linguistic and statistical features developed in Python with Spacy, NLTK, Pandas, MatplotLib and Seaborn.
2- Abstractive Summarization Models Development -
- traditional recurrent neural networks - Pytorch-Cuda - server deployment
- vanilla transformer summarizer - Pytorch-Cuda - server deployment
- Huggingface library for pre-trained language models such as BERT, mBART, mT5, Pegasus, LongFormer Encoder-Decoder, XLSum, BigBird, etc. - Pytorch-Cuda - server deployment
- Simplification model based on Reinforcement learning - Pytorch Cuda, Apex
Multi-task Learning model - Pytorch-Cuda-DeepSpeed - server deployment
3- Evaluation
- Automatic evaluation using ROUGE, BERT Score, Flesch Kincaid Reading Ease - Python - Pytorch
- Statistical testing of automatic results with Mann-Whitney t test - Python
- Human judgments and their verification with Fleiss’s Kappa
- In-depth linguistic analysis with Python
Tools and Technologies: Python, Pytorch, CUDA, DeepSpeed for model parallelization on GPU servers, WandB for computation analysis, Google Colab, Amazon AWS, TensorFlow
Presentations

Cross-lingual Science Journalism: Select, Simplify and Rewrite Summaries for Non-expert Readers
Mehwish Fatima and 1 other author

A Novel Wikipedia based Dataset for Monolingual and Cross-Lingual Summarization
Mehwish Fatima and 1 other author