New Jan 6, 2025

Developing an ASL App with Kaggle’s Top Model and Customized MediaPipe Gesture Model

The Giants All from DEV Community View Developing an ASL App with Kaggle’s Top Model and Customized MediaPipe Gesture Model on dev.to

Introduction

This app is designed as a learning tool that allows users to check their proficiency in American Sign Language (ASL) through AI-powered quizzes after engaging with ASL content on YouTube. For ASL recognition, I utilized a top-performing model from a Kaggle competition and referenced educational content from YouTube channels such as ASLU and Start ASL. I integrated these resources to create this app as a Proof of Concept (PoC) to demonstrate the technical feasibility of the idea. While the UI/UX is still in its initial stage and has room for improvement, the focus of this project was on showcasing the underlying technical capabilities.

Also,the gesture classification model used in this project references the models created by @hoyso48 and @ohkawa3. I was truly impressed by the exceptional quality. I sincerely appreciate the generosity in sharing both the models and the Jupyter Notebook on Kaggle.

GitHub Repo: https://github.com/yoshan0921/asl-practice-app

About the App

This app offers two main features for learning and practicing American Sign Language (ASL):

  1. Finger Spelling Practice

  1. Basic Vocabulary Practice

This interactive format allows users to actively reinforce their ASL skills through practice and immediate feedback. For a detailed demonstration, please refer to the following video.

App Demo

  1. Finger Spelling Practice

  1. Basic Vocabulary Practice

Technology Stack

The app was developed using the following technologies:

Architecture Overview

The following diagram illustrates the key components and their interactions:

system architecture

  1. Finger Spelling Recognition

  1. Basic Vocabulary Recognition

Insights and Challenges

  1. Real-time Performance Achieved

The app successfully achieved sufficient real-time performance for both the Finger Spelling and Basic Vocabulary features, ensuring smooth and responsive interactions for users during gesture recognition.

  1. Technology Choice: REST API vs. Socket.IO

The Kaggle ASL competition model used in this project was designed for implementation in TFLite format, which could have been executed directly in the browser. However, for this Proof of Concept (PoC), Flask and Python were chosen to implement the gesture recognition functionality as a REST API to prioritize ease of data processing and development efficiency.

By the way, when the frame data accumulated on the back end is visualized, it appears as follows. This serves as the input data for the gesture recognition classification model.

Challenges with REST API:

Initially, the front-end recorded and accumulated frame data, sending it to the back-end in bulk via a POST request, but the server-side preprocessing required before executing the model introduced a slight delay, resulting in a noticeable lag between performing gestures and receiving recognition results.

Solution with Socket.IO:

To address this issue, Socket.IO was used instead of the REST API, transmitting frame data to the server incrementally in real-time and enabling stepped pre-processing, which successfully eliminated the lag observed in the REST API implementation.

Scalability Concern:

While Socket.IO proved effective, it may face performance issues under heavy server load or high concurrent connections due to increased processing demands on the back end, a scalability risk that was not fully tested within the scope of this project and remains an area for future investigation.

  1. MediaPipe Model Integration Challenges

The app uses MediaPipe’s Gesture Recognition model for Finger Spelling and the Holistic model for Basic Vocabulary in the front-end.

Error Encountered:

When loading the Gesture Recognition model after the Holistic model, an error occurred despite ensuring cleanup through the useEffect lifecycle by calling the model instance's close() method.

Failed to load MediaPipe gesture model: RuntimeError: Aborted(Module.arguments has been replaced with plain arguments_ (the initial value can be provided on Module, but after startup the value is only looked for on a local variable of that name)) at abort (holistic_solution_si…wasm_bin.js:9:17640) at Object.get (holistic_solution_si…_wasm_bin.js:9:7759) at vision_wasm_internal.js:9:2905 at async createGestureRecognizer (Quiz.tsx:49:35)

Workaround:

The error was eventually resolved by explicitly setting window.Module to undefined to clear the previous state.

Unresolved Root Cause:

While the workaround fixed the issue, the exact cause of the error and why the solution worked remain unclear. This aspect requires further investigation for a robust understanding.

Unresolved issues

  1. Dynamic Gestures (J and Z) for Finger Spelling

Related link: MediaPipe Gesture Recognizer

  1. Playback of Learning Content

Related link: Player API Reference

Planned Features

References

Scroll to top