👋 Welcome to my portfolio

Hi, I'm Jijo James 👋

A passionate Data Engineer specializing in building scalable data pipelines and LLM-powered applications.

Profile
About Me

Building robust data infrastructure

I'm a Data Engineer passionate about building robust data infrastructure and leveraging Large Language Models to solve complex problems. I specialize in designing scalable ETL pipelines, data warehousing solutions, and AI-powered applications.

When I'm not architecting data systems, you'll find me experimenting with the latest LLM frameworks, exploring new technologies in the data space, or writing about data engineering best practices. Outside of work, I enjoy hiking, rock climbing, running, reading fiction, and exploring movies and gems on YouTube.

30+
Data Projects
9+
Years Experience
My Expertise

Skills & Technologies

Technologies I've been working with recently

Data Engineering

Python SQL Apache Spark Airflow dbt

LLM & AI

LangChain OpenAI API RAG Vector DBs Hugging Face

Cloud & Tools

AWS Databricks GCP Docker Kubernetes
My Work

Featured Projects

Data engineering and AI/LLM projects that showcase my expertise

Real-time Data Pipeline
Kafka Spark Airflow

Real-time Data Pipeline

Streaming data pipeline processing 10M+ events/day with Kafka, Spark Streaming, and Delta Lake for real-time analytics.

RAG Knowledge Base
LangChain OpenAI Pinecone

RAG-powered Knowledge Base

Enterprise knowledge assistant using RAG architecture with LangChain, GPT-4, and vector search for intelligent document Q&A.

ML Feature Store
Databricks MLflow Delta Lake

ML Feature Store

Centralized feature platform enabling reusable ML features across teams with real-time serving and feature versioning.

LLM Data Extraction
Python GPT-4 FastAPI

LLM Document Extraction

Automated data extraction from unstructured documents using LLMs with 95%+ accuracy for clinical trial data processing.

Career Path

Work Experience

My professional journey so far

Senior Data Engineer

2025 - Present

Eli Lilly and Company

Lead the data migration of multiple acquired companies. Building LLM-powered data pipelines and RAG systems. Leading data infrastructure modernization initiatives.

Data Engineer

2024

AI Palette

Built and optimized ETL pipelines using Airflow and Spark. Reduced execution time by 90% through Jenkins automation and cut infrastructure costs by migrating to Kubernetes. Implemented an in-house LLM-powered translator.

Co-Founder

2021 - 2024

Dataque

Built full-stack data platform using FastAPI, PostgreSQL, and Python. Designed ETL pipelines processing 7M+ data points with 25% improved accuracy. Led product development and helped startups scale through data-driven solutions.

Data Analyst

2019 - 2021

Leadlytics

Led automation projects using Python and Selenium, extracting 2M+ user records. Built data pipelines for US clients using SQL and Bash scripting. Managed a team of 3 and drove B2B lead generation initiatives.

Independent Consultant

2016 - 2019

Freelance

Helped local businesses and early-stage startups build their digital presence. Generated leads through outbound campaigns and LinkedIn outreach for D2C companies.

Get In Touch

Let's Work Together

Have a project in mind? I'd love to hear about it. Send me a message and let's create something amazing together.