profile picture

Mehwish Fatima

Graduate student @ Heidelberg Institute for Theoretical Studies

simplification

summarization

cross-lingual science journalism

2

presentations

2

number of views

SHORT BIO

I am working as a Guest NLP Scientist at Heidelberg Institute for Theoretical Studies (HITS), Germany, under Prof. Dr. Michael Strube, and enrolled as a Ph.D. scholar of Computational Linguistics at Universität Heidelberg. I am working on an R&D-based industry project for SPEKTRUM der Wissenschaft. The working title of the project/thesis is "Single Document Cross-lingual Abstractive Summarization for Scientific Texts".

During my Ph.D. project, I focused on:


1- Data Collection and Analysis

  • from online resources with various Python libraries such as Wiki Api, Beautiful Soup, Tika, NLTK and Pandas.
  • verification and analysis of curated datasets with linguistic and statistical features developed in Python with Spacy, NLTK, Pandas, MatplotLib and Seaborn.

2- Abstractive Summarization Models Development -

  • traditional recurrent neural networks - Pytorch-Cuda - server deployment
  • vanilla transformer summarizer - Pytorch-Cuda - server deployment
  • Huggingface library for pre-trained language models such as BERT, mBART, mT5, Pegasus, LongFormer Encoder-Decoder, XLSum, BigBird, etc. - Pytorch-Cuda - server deployment
  • Simplification model based on Reinforcement learning - Pytorch Cuda, Apex
  • Multi-task Learning model - Pytorch-Cuda-DeepSpeed - server deployment


3- Evaluation

  • Automatic evaluation using ROUGE, BERT Score, Flesch Kincaid Reading Ease - Python - Pytorch
  • Statistical testing of automatic results with Mann-Whitney t test - Python
  • Human judgments and their verification with Fleiss’s Kappa
  • In-depth linguistic analysis with Python

Tools and Technologies: Python, Pytorch, CUDA, DeepSpeed for model parallelization on GPU servers, WandB for computation analysis, Google Colab, Amazon AWS, TensorFlow

Presentations

Cross-lingual Science Journalism: Select, Simplify and Rewrite Summaries for Non-expert Readers

Mehwish Fatima and 1 other author

A Novel Wikipedia based Dataset for Monolingual and Cross-Lingual Summarization

Mehwish Fatima and 1 other author

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved