Skip to content

rahulstech/speech-to-text-java-spring-react

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Labmentix STT Logo

Labmentix STT

Speech-to-Text Web Application

Developed during the Labmentix Java Backend Internship (May 2026)

Live Demo

LogIn with demo credentials

email:

example@domain.com

password

123@ABC

Overview

Labmentix STT is a full-stack Speech-to-Text web application that enables users to upload audio files or record audio directly from their browser and generate accurate text transcriptions.

The application leverages Deepgram's Speech-to-Text API for transcription generation with automatic language detection and multilingual support. Audio files are uploaded directly from the frontend to Firebase Storage, while the backend processes transcription requests and manages user data securely.


Screen Shots

Log In Register Chat History Recording
Log In Screen Register Screen Chat History Recording

Features

Authentication

  • User Registration
  • User Login
  • JWT-Based Authentication
  • Protected API Endpoints

Speech-to-Text

  • Upload Audio Files
  • Record Audio Directly from Browser
  • Automatic Language Detection
  • Multilingual Transcription Support
  • Fast and Accurate Speech Recognition via Deepgram

History Management

  • View Previous Transcriptions
  • Paginated Transcription History
  • Retrieve Individual Transcript Records

Cloud Storage

  • Direct Audio Upload to Firebase Storage
  • Reduced Backend Storage Overhead
  • Secure File Management

Tech Stack

Frontend

  • React
  • TypeScript
  • Vite
  • Tailwind CSS
  • React Router
  • React Context API
  • TanStack Query
  • Axios

File Storage

  • Firebase Storage

Backend

  • Java 21
  • Gradle 9.4.1
  • Spring Boot 4.0.6
  • Spring Security
  • Spring Data JPA
  • JWT Authentication

Database

  • PostgreSQL 16

Speech Processing

  • Deepgram Speech-to-Text API
  • Automatic Language Detection
  • Multi-Language Support

Deployment

  • Frontend: Vercel
  • Backend: Render
  • Database: PostgreSQL (Render)
  • Containerization: Docker

System Architecture

System Architecture


Audio Processing Workflow

Audio Processing Workflow


API Endpoints

Authentication

Register User

POST /api/auth/register

Creates a new user account and returns an authentication token.

Response

{
  "tokens": {
    "accessToken": "eyJ..."
  },
  "user": {
    "id": 1,
    "name": "Rahul Bagchi",
    "email": "rahul@example.com"
  }
}

Login User

POST /api/auth/login

Authenticates a user and returns an access token.

Response

{
  "tokens": {
    "accessToken": "eyJ..."
  },
  "user": {
    "id": 1,
    "name": "Rahul Bagchi",
    "email": "rahul@example.com"
  }
}

Speech Transcription

Create Transcript

POST /api/speech

Request

{
  "audioUrl": "https://firebase-storage-url"
}

Response

{
  "id": 1,
  "transcript": "Generated transcript text..."
}

Get Transcript By ID

GET /api/speech/{id}

Returns a specific transcript.


Get Transcription History

GET /api/speech/history

Returns paginated transcription history for the authenticated user.


Local Development Setup

Prerequisites

  • Java 21
  • Node.js
  • PostgreSQL 16
  • Firebase Project
  • Deepgram API Key

Backend Setup

Clone the repository:

git clone https://github.com/rahulstech/speech-to-text-java-spring-react
cd backend

Create RSA private and public keys for JWT.

# private key
openssl genrsa -out private.pem 2048

# public key from private key
openssl rsa -in private.pem -pubout -out public.pem

Copy and reaname .env.example to .env and fill the variables.

Run the application:

./gradlew bootRun

Backend runs on:

http://localhost:8080

Frontend Setup

Navigate to frontend:

cd frontend

Install dependencies:

npm install

Rename .env.example to .env and fill the variables.

Run the application:

npm run dev

Frontend runs on:

http://localhost:5173

Docker

Build Docker image from Dockerfile:

docker build -t stt-java-backend:<current-backend-version> .

Run container:

docker run -d --env-file .env -p 8080:8080 --name stt-backend-local stt-java-backend:<version>

Push the image to Docker Hub.

First create the docker repository for this image. In the following case the repository name is stt-java-backend. Docker hub allow public repositories without any charge.

docker tag stt-java-backend:<version> docker-user-name/stt-java-backend:<version>

docker tag stt-java-backend:<version> docker-user-name/stt-java-backend:latest

docker push docker-user-name/stt-java-backend:<version>
docker push docker-user-name/stt-java-backend:latest

Database

PostgreSQL 16 is used to store:

  • User Accounts
  • Authentication Data
  • Transcription Records
  • Transcript Metadata

Future Enhancements

  • AI-Generated Summaries
  • Sentiment Analysis
  • PDF Export
  • Keyword Extraction
  • Transcript Search

Project Information

Project Type: Internship Project

Organization: Labmentix

Internship Track: Java Fullstack Development

Completion Date: May 31, 2026


Author

Rahul Bagchi

  • Android Developer (4+ Years as of May 2026)
  • Flutter Developer (6 months as of May 2026)
  • Java Backend Developer

GitHub: https://github.com/rahulstech

LinkedIn: https://www.linkedin.com/in/iamrahulbagchi

Youtube: http://www.youtube.com/@rahulstech2018

X: https://x.com/bagchirahul24

Email: rahulstech18@gmail.com

About

Speech-to-Text web application built with Spring Boot 4, React, PostgreSQL, Firebase Storage, and Deepgram API. Supports audio upload, browser recording, multilingual transcription, and automatic language detection.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages