Human Rights Violations Analysis 2024

Project Overview

This project provides an in-depth analysis of human rights violations in 2024, based on news data from Human Rights Watch. I've processed 401 articles to extract key information on accusations, risks, and affected victims, using Large Language Models (LLMs) to convert qualitative data into quantitative insights.

Methodology

1. Data Collection

I scraped 401 articles from Human Rights Watch using the GDELT API, focusing on publications from January 1, 2024, to August 28, 2024. Upon receiving the article links, I scraped each of them to extract articles content and metadata.

2. Data Processing

I employed Large Language Models (LLMs), specifically OpenAI's GPT-4o-mini model, as a gateway to convert qualitative textual data into quantitative data. This approach allowed me to systematically categorize and quantify complex human rights information.

LLM-Powered Qualitative to Quantitative Conversion:

For each category, I used carefully crafted prompts to guide the LLM in extracting relevant information and formatting it in a consistent, quantifiable manner. This process involved:

  1. Feeding article text to the LLM with specific instructions.
  2. Parsing the LLM's structured output into a format suitable for data analysis.
  3. Aggregating results across all articles to generate quantitative datasets.

3. Data Analysis and Visualization

I used Python libraries such as pandas, matplotlib, and seaborn to process the quantified data and create visual representations. My analysis focused on identifying patterns, trends, and correlations within and across the three main categories: accusations, risks, and victims.

Visualizations

My analysis is divided into three main categories, each represented by a series of data visualizations:

Explore each category in detail using the navigation menu above to view the full set of visualizations and their interpretations.

Limitations and Future Work

While my LLM-based approach allows for efficient processing of large volumes of qualitative data, it's important to note potential limitations such as model biases and the need for human verification. Future work could involve refining my prompts, incorporating multiple LLMs for cross-validation, and integrating expert human review to further enhance the accuracy and reliability of my findings.