Back to home

Project

Multimodal NSFW Content Detection System

A research project focused on child safety and harmful-content filtering, combining text and image understanding in a single low-latency system for real-time browser-based protection.

PythonYOLOv8BigBirdPyTorchFastAPIAWS

Timeline

January 2025 - September 2025

Outcome

Delivered a multimodal safety system with strong text and image performance and real-time filtering behavior.

Problem

Content safety systems that rely on only text or only images leave blind spots. The goal was to detect unsafe content in real time using both modalities while keeping latency low enough for browser-based filtering.

Outcome achieved

  • Reached 97.3% text accuracy with BigBird-RoBERTa and 0.899 mAP@50 with YOLOv8m for image detection.
  • Maintained response times below 250ms in the browser-assisted architecture.
  • Curated and annotated a 27K+ multimodal dataset to improve model performance and coverage.

Challenges faced

  • Combining two different modality pipelines into a single low-latency experience.
  • Managing deployment tradeoffs between local inference and cloud-hosted services.
  • Building a dataset with enough quality and balance for both text and image training.

How I solved them

  • Used a hybrid architecture with on-device visual inference and AWS-hosted text analysis.
  • Built a multithreaded browser architecture to reduce overhead during live filtering.
  • Applied data balancing and augmentation to improve robustness during training.

Technical details

Project links