Saijal Shahania

Saijal Shahania

Research Area Research Infrastructure and Methods
Researcher
  • +49 511 450670-0
  • +49 511 450670-960

I did my bachelor's in Punjab, India and worked as a decision consultant for three years after at MuSigma. After that, I decided to study Data and Knowledge Engineering at the Otto-Von-Guericke University (OVGU) Magdeburg, where I finished my degree in 2022. I did a lot of research as a HiWi on Natural Language Processing and Time Series Prediction. In addition, I worked as a Machine Learning, Data Mining and Deep Learning tutor for several terms. Currently, I am pursuing my PhD at the DZHW in cooperation with the OVGU.

Read more Read less

Academic research fields

Natural Language Processing, especially Topic Modeling, Feature Engineering and Similarity Metrics, Explainable AI and Machine Learning, Deep Learning

Projects

List of projects

Unfortunately, there is no result available for this search combination
The Appointment of Professors at Private and State Universities of Applied Sciences
Publications

List of publications

Unfortunately, there is no result available for this search combination

WISHFUL - Website extraction of Institutional Sources with Heterogeneous Factors and User-Driven Linkage.

Shahania, S., Spiliopoulou, M., & Broneske, D. (2023).
WISHFUL - Website extraction of Institutional Sources with Heterogeneous Factors and User-Driven Linkage. In Delir Haghighi, P. et al. (Hrsg.), Information Integration and Web Intelligence (iiWAS 2023) (S. 20-26). Cham: Springer. https://doi.org/10.1007/978-3-031-48316-5_3
Abstract

Extracting information from diverse websites is increasingly important, especially for analyzing vast data sets to detect trends, gain insights. By studying job ads, researchers can monitor employer demand shifts, assisting policymakers in aiding affected workers and industries. However, extraction faces challenges like varied website formats, dynamic content, and duplicate data. This study introduces a method for extracting data from diverse private university websites involving keyword identification, website categorization, and extraction pipelines.

FACADE: Fake articles classification and decision explanation.

Shahania, S., Purificato, E., Thiel, M., & William De Luca, E. (2023).
FACADE: Fake articles classification and decision explanation. In J. Kamps et al. (Hrsg.), Advances in Information Retrieval (S. 294-299). Cham: Springer. https://doi.org/10.1007/978-3-031-28241-6_29

Tell me why it’s fake: Developing an explainable user interface for a fake news detection system.

Shahania, S., Purificato, E., & William De Luca, E. (2022).
Tell me why it’s fake: Developing an explainable user interface for a fake news detection system. In CEUR Workshop Proceedings (Hrsg.), Proceedings of the 3rd Italian Workshop on Explainable Artificial Intelligence (XAI.it 2022). Udine, Italy: CEUR.
Abstract

In this paper, we present the design and development of an explainable user interface for a fake news detection system. The problem of distinguishing real from fake articles gained a lot of popularity in the last few years, mainly due to the soaring diffusion of social networks and internet bots as means for propaganda and disinformation sharing. By leveraging various explainability methods, i.e. feature importance, partial dependence plots and SHAP values, we aim to show how the combination of different techniques embedded in an interactive user interface can lead to enhance trust in a detection system for a non-expert user, such as a fact-checker or a content manager. Through several examples, we describe all the explainability component

Predicting ecological momentary assessments in an app for Tinnitus by learning from each user's stream with a contextual multi-armed bandit.

Shahania, S., Unnikrishnan, V., Pryss, R., Kraft, R., Schobel, J., ... & Spiliopoulou, M. (2022).
Predicting ecological momentary assessments in an app for Tinnitus by learning from each user's stream with a contextual multi-armed bandit. Frontiers in Neuroscience Sec. Auditory Cognitive Neuroscience, 2022(16), 1-17. https://doi.org/10.3389/fnins.2022.836834

Legal norm retrieval with variations of the bert model combined with TF-IDF vectorization.

Wehnert, S., Sudhi, V., Dureja, S., Kutty, L., Shahania, S., & W. De Luca, E. (2021).
Legal norm retrieval with variations of the bert model combined with TF-IDF vectorization. In Association for Computing Machinery (Hrsg.), ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, São Paulo, Brazil (S. 285-294). New York, NY, United States: Association for Computing Machinery. https://doi.org/10.1145/3462757.3466104
Abstract

In this work, we examine variations of the BERT model on the statute law retrieval task of the COLIEE competition. This includes approaches to leverage BERT's contextual word embeddings, fine-tuning the model, combining it with TF-IDF vectorization, adding external knowledge to the statutes and data augmentation. Our ensemble of Sentence-BERT with two different TF-IDF representations and document enrichment exhibits the best performance on this task regarding the F2 score. This is followed by a fine-tuned LEGAL-BERT with TF-IDF and data augmentation and our third approach with the BERTScore. We show that there are significant differences between the chosen BERT approaches and discuss several design decisions in the context of statute law.

User-centric vs whole-stream learning for EMA prediction.

Shahania, S., Unnikrishnan, V., Pryss, R., Kraft, R., Schobel, J., ... & Spiliopoulou, M. (2021).
User-centric vs whole-stream learning for EMA prediction. In IEEE (Hrsg.), 2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS). Aveiro, Portugal: IEEE. https://doi.org/10.1109/CBMS52027.2021.00033
Presentations

List of presentations & conferences

Unfortunately, there is no result available for this search combination

Bot behavior in web surveys: A showcase.

Shahania, S., Claaßen, J., Höhne, J. K., & Broneske, D. (2024, Juni).
Bot behavior in web surveys: A showcase. Vortrag auf der Konferenz Data collection, data quality and data ethics in the age of artificial intelligence, Wiesbaden.

Mitigating the risk of bots in web surveys recruited via social media.

Shahania, S., Claaßen, J., Höhne, J. K., & Broneske, D. (2024, März).
Mitigating the risk of bots in web surveys recruited via social media. Vortrag im Department of Methodology and Statistics, Utrecht University (The Netherlands), Utrecht.

WISHFUL - Website Extraction of Institutional Sources with Heterogeneous Factors and User-Driven Linkage.

Shahania, S., Spiliopoulou, M., & Broneske, D. (2023, Dezember).
WISHFUL - Website Extraction of Institutional Sources with Heterogeneous Factors and User-Driven Linkage. Vortrag auf der Konferenz The 25th International Conference on Information Integration and Web Intelligence (iiWAS 2023) and The 21st International Conference on Advances in Mobile Computing & Multimedia Intelligence (MoMM2023), Denpasar, Bali, Indonesien.

TRIWIZARD : Trend recognition in the WissZeitVG - Discussion " #IchBinHanna " zooming in on aspects of recent debates.

Shahania, S. (2023, Juli).
TRIWIZARD : Trend recognition in the WissZeitVG - Discussion "#IchBinHanna" zooming in on aspects of recent debates. Vortrag im Rahmen des Colloquium by Knowledge Management & Discovery Lab, Otto-von-Guericke-Universität, Magdeburg.

FACADE: Fake articles classification and decision explanation.

Shahania, S. (2023, Mai).
FACADE: Fake articles classification and decision explanation. Poster im Rahmen des DZHW-Forschungstags 2023, Deutsches Zentrum für Hochschul- und Wissenschaftsforschung (DZHW), Hannover.

FACADE: Fake articles classification and decision explanation.

Shahania, S. (2023, April).
FACADE: Fake articles classification and decision explanation. Poster auf der Konferenz The 45th European Conference on Information Retrieval (ECIR 2023), Dublin, Ireland.
Curriculum Vitae
Employment
since 09/2022

Research Assistant, DZHW

  • Similarity of Regulations and Content Extraction, Recognition and Explanation for Researchers
  • Trend Recognition in the WissZeitVG-Discussion 'IchBinHanna' zooming in on aspects of recentdebates
  • Web crawling of job postings for automatic evaluation

04/2021 - 12/2021

Student Assistant, Otto-von-Guericke-Universität, Magdeburg, Germany
Project Qualiman

  • Long-term analysis of students on solid data in order to measure their progress, academic successand their study duration, comparing degrees to each other
  • Usage of R and SQL to statistically evaluate the collected data, automate the recording andcombination of different sources and simplify derivable actions in a frontend application

04/2020 - 11/2021

Software Developer, LegalHorizon AG, Magdeburg, Germany

  • Development of a tool for retrieval of law documents given legislative preparatory documents(e.g. proposals) from the EUR-Lex portal
  • EUR-Lex is a website providing details about all public documents of the European Union as wellas existing laws and pending proposals

10/2019 - 07/2022

Teaching Assistant, Otto-von-Guericke-Universität, Magdeburg, Germany
Conducting exercise classes for Master and Bachelor study programs: Data Mining, Machine Learning, Knowledge Engineering and Digital Humanities, Deep Learning

10/2019 - 03/2020

Student Assistant, Otto-von-Guericke-Universität, Magdeburg, Germany
Data pre-processing, data extraction from the EUR-Lex website, text summarising, topic modellingand clustering the proposals based on law semantics for use in adhoc requests.

09/2019 - 12/2019

Software Developer, in4s GmbH, Magdeburg, Germany
Task was about camera calibration for an ongoing project with Volkswagen for getting the intrinsic, extrinsic, and distortion parameters for the cameras.

10/2019 - 03/2020

Senior Decision Scientist, Musigma Business Solutions, Bangalore, India

  • Development of an analytical data warehouse together with stakeholders for increasing operational efficiency using SQL and Netezza as well JIRA and Bitbucket
  • Optimization of legal spend, its efficiency and improvement of the audit process together with an Australian insurance company using SQL, Netezza and Cognos
  • Proactive identification of fraudulent claims in the initial phases of the claim life-cycle with an estimated savings of around 4M for an Australian insurance company using Machine Learning (esp. Text Mining) methods in R and Python

Read more Read less
Education
09/2022 - ongoing

PhD, DZHW, Hannover, Germany

10/2018 - 07/2022

Master of Science, Otto-von-Guericke-Universität, Magdeburg, Germany

  • Otto-von-Guericke Scholarship 2020 for exceptional academic achievement and social engagement
  • Focus on the fields of Machine Learning (especially Natural Language Processing, Text Mining) and Data Mining
  • Master's thesis in the detection of fake news and explainability of the results

08/2011 - 05/2015

Bachelor in Information Technology, UIET, Panjab University, Chandigarh, Punjab, India