Project

Multimodal NSFW Content Detection System

A research project focused on child safety and harmful-content filtering, combining text and image understanding in a single low-latency system for real-time browser-based protection.

PythonYOLOv8BigBirdPyTorchFastAPIAWS

Timeline

January 2025 - September 2025

Outcome

Delivered a multimodal safety system with strong text and image performance and real-time filtering behavior.

Problem

Content safety systems that rely on only text or only images leave blind spots. The goal was to detect unsafe content in real time using both modalities while keeping latency low enough for browser-based filtering.

Outcome achieved

Reached 97.3% text accuracy with BigBird-RoBERTa and 0.899 mAP@50 with YOLOv8m for image detection.
Maintained response times below 250ms in the browser-assisted architecture.
Curated and annotated a 27K+ multimodal dataset to improve model performance and coverage.

Challenges faced

Combining two different modality pipelines into a single low-latency experience.
Managing deployment tradeoffs between local inference and cloud-hosted services.
Building a dataset with enough quality and balance for both text and image training.

How I solved them

Used a hybrid architecture with on-device visual inference and AWS-hosted text analysis.
Built a multithreaded browser architecture to reduce overhead during live filtering.
Applied data balancing and augmentation to improve robustness during training.

Technical details

BigBird-RoBERTa text model for sequence classification and contextual unsafe-content detection.
YOLOv8m image pipeline for visual content classification and object-aware filtering.
Hybrid deployment strategy across browser-side execution and AWS-hosted APIs.
Custom multimodal dataset curation workflow with annotation, balancing, and augmentation.

Project links

Trained Model Publication DOI