Oluwaferanmi Wale-Bogunjoko

Data Scientist | Cloud Data Engineer | Full-Stack Developer

+12403672085 · feranwb@gmail.com · Github.com/Maws7140 · State College, PA

Professional Summary

Results-driven Data Scientist and Cloud Engineer with expertise in big data analytics, cloud infrastructure, and full-stack development. Proven track record of architecting complex systems, contributing to open-source projects (78+ merged PRs), and implementing scalable data pipelines. Specializing in Apache Spark, Azure cloud services, and machine learning with hands-on experience processing millions of records and building production-ready applications.

Education

Pennsylvania State University | State College, PA
Bachelor of Science in Data Science | August 2022 – Present
Relevant Coursework:

DS 410: Big Data Analytics (MapReduce, Spark, RDD, Clustering, PCA)
CMPSC 461: Programming Language Design & Implementation (Lexical Analysis, Parsing, Lambda Calculus, Type Systems, Memory Management)
DS 220: Data Science Foundations
CMPSC 360: Discrete Mathematics for Computer Science
CMPSC 221: Data Structures & Algorithms
STAT 414,415, 380, 184: Advanced Statistics, Probability & Statistical Inference
MATH 230: Calculus & Analytical Geometry

Certifications

Azure Fundamentals (AZ-900) — March 2025
AWS Certified Cloud Practitioner — July 2025

Professional Experience

Cloud & Data Engineering Intern

Covenant Global Services (CGS) | May 2025 – August 2025 | Maryland
Azure Cloud Infrastructure:

Automated provisioning of Azure Storage Accounts and Azure Data Factory using infrastructure-as-code with Azure DevOps pipelines
Created and configured self-hosted integration runtimes for seamless data orchestration across cloud and on-premises systems
Implemented Azure Policies and managed enterprise-wide user access controls via Active Directory
Managed application secrets and credentials using Azure Key Vault with automated rotation policies
Networking & Security:
Codified network security by deploying Network Security Group (NSG) rules for secure inter-server communication on port 22
Performed network diagnostics using NS Lookup for troubleshooting connectivity and DNS resolution issues
Secured platform access by integrating Informatica with Azure AD using SAML-based Single Sign-On (SSO)
Extended and managed expiring application credentials across multiple enterprise systems
Data Analytics & Optimization:
Leveraged Kusto Query Language (KQL) for advanced analytics on Azure cost management datasets, identifying opportunities for resource optimization
Worked with Azure Monitor to provide actionable right-sizing recommendations, reducing infrastructure costs
Analyzed cloud resource usage patterns to improve efficiency and cost-effectiveness
Technologies: Azure (Data Factory, Storage, DevOps, Policies, Key Vault, NSGs, Monitor), Kusto Query Language (KQL), PowerShell, ARM Templates, Terraform, Active Directory, Informatica

Python Developer

Freelance (Upwork) | June 2025 – Present | Remote

Built a multi-exchange trading bot integrating Robinhood and Schwab APIs with automated order management, risk controls, and real-time reporting dashboards
Implemented robust error handling and failover mechanisms for high-reliability trading operations
Developed real-time monitoring and alerting systems for trading activities

Technical Projects

UFC Sell-Through Prediction & Card Optimization | Big Data Analytics

Status: Completed |Fall 2025
Technologies: Apache Spark, PySpark, Spark MLlib, Spark SQL, Python, Google Trends API, Reddit API, Slurm
Overview:
Developed a comprehensive big data analytics system to predict UFC ticket sales by analyzing fighter impact on event attendance, processing 2.8 million+ records across 430MB of raw data.
Data Engineering:

Built end-to-end ETL pipelines using PySpark DataFrames with advanced SQL and window functions for feature engineering
Integrated multiple data sources:
- UFC event schedules and fighter statistics
- Google Trends data (2.3M rows) for market interest metrics
- Reddit sentiment analysis (380K rows) for fan engagement
- Betting odds and venue capacity data
Optimized data storage using Parquet format for efficient distributed querying and processing
Implemented schema harmonization across disparate data sources
Machine Learning:
Engineered 36+ features including:
- Fighter performance metrics (win percentage, finish rate, average fight time)
- Experience factors (days since last bout, career longevity)
- Style contrast indicators (striker vs wrestler matchups)
- Rivalry and rematch indicators
- Market interest and sentiment metrics
Trained GBTRegressor and GLM models with cross-validated hyperparameter tuning using Spark MLlib
Implemented feature importance analysis to identify key drivers of ticket sales
Calibrated predictions by venue capacity for realistic sell-through estimates
Optimization & Deployment:
Developed card optimizer using greedy algorithm with 2-opt swaps to maximize predicted attendance
Conducted scalability analysis with 1/2/4 executor benchmarking on Penn State ICDS Roar cluster
Deployed on distributed computing cluster via Slurm with Spark Adaptive Query Execution across 2 nodes
Deliverables:
Predictive model with comprehensive feature importance analysis
Venue capacity calibration system
Card optimization algorithm for fight placement

Obsidian Storyteller Suite | Open-Source Plugin Development

Status: Published | Spring 2025
Technologies: TypeScript, JavaScript, Obsidian Plugin API, vis-timeline, esbuild, Vitest, CI/CD
Repository: GitHub
Overview:
Core contributor to comprehensive dashboard plugin for writers, published to Obsidian community plugin store with 6,000+ downloads. Merged 78+ pull requests (PR #38 through #78 +), demonstrating sustained open-source contribution and collaborative development.
Feature Development:

Architected modular system for character management, location tracking, event/scene organization, and chapter management
Built reference library system for language creation, prophecies, and creative inspiration with multi-book support
Engineered portable data persistence layer using YAML and Markdown for cross-platform compatibility
Implemented interactive timeline visualization with custom calendar support:
- BCE/CE date parsing with custom calendar systems
- Custom month and day configurations
- Event timeline rendering with drag-and-drop functionality
  Technical Achievements:
Developed multilingual support for Chinese, English, and other languages with proper translation display
Resolved critical mobile UI bugs on iOS and Samsung devices, improving cross-platform reliability
Fixed complex issues including:
- Timeline gridlines and rendering bugs
- Timezone handling for global users
- Fuzzy matching for entity searches
- Schema harmonization across plugin versions
Maintained backward compatibility across major version updates
Established modern development workflow with Vitest for unit testing and esbuild for efficient bundling
Impact:
Published to official Obsidian community plugin marketplace
Actively maintained with ongoing bug fixes and feature enhancements
Enhanced visual hierarchy and user experience based on community feedback
Built engaged user community with responsive support

Study Group | AI-Powered Study Hub (Hackathon Project)

Status: Completed |Fall 2024
Technologies: React, FastAPI, Python, Supabase, PostgreSQL, LangChain, LlamaIndex, FAISS, ChromaDB
Overview:
Designed and implemented intelligent, private study hub for college classes with AI chatbot assistance, enabling secure student collaboration while maintaining data privacy.
Full-Stack Architecture:

Frontend: Built responsive user interface with React for student and professor dashboards
Backend: Developed microservices architecture using FastAPI (Python) with RESTful API endpoints
Database: Implemented Supabase (PostgreSQL) with Row-Level Security (RLS) policies for multi-tenant data isolation
AI/ML Pipeline: Integrated LangChain/LlamaIndex for Retrieval-Augmented Generation (RAG)
Vector Store: Deployed FAISS and ChromaDB for semantic search on class documents
Authentication: Implemented secure authentication and authorization using Supabase Auth
Key Features:
Student signup system with unique class codes and role-based access control
Secure document upload and management with cloud storage integration (Supabase Storage)
AI chatbot indexed exclusively on class-specific documents, preventing cross-contamination between classes
Professor control panel with customizable guardrails for AI responses
Enterprise-grade security with encryption, RLS policies, and access controls
Technical Implementation:
Designed database schemas with normalized tables and RLS policies for data isolation
Built RAG pipeline with document ingestion, chunking, embedding, and retrieval
Implemented real-time updates and notifications for collaborative features
Created comprehensive API documentation and testing suite
Impact:
Production-ready system with finalized architecture and integrated AI services
Demonstrated ability to rapidly prototype and build complex full-stack applications

UFC Fight Outcome Prediction | Full-Stack ML Application

Status: Completed | Spring 2024
Technologies: Python, Scikit-learn, Flask, Azure App Service, GitHub Actions

Developed full-stack web application achieving 78% cross-validated accuracy using dual-model ensemble classifier
Designed bias-elimination system using confidence-weighted averages from two distinct scikit-learn models (Random Forest, Gradient Boosting)
Exposed prediction engine via RESTful Flask API with JSON endpoints for predictions
Automated deployment to Azure App Service via GitHub Actions CI/CD pipeline
Implemented comprehensive testing suite for model validation and API testing

Java Course Scheduler | Data Management Application

Status: Completed | Fall 2022
Technologies: Java, SQL, Apache Derby, JDBC

Architected data management application applying Object-Oriented Programming (OOP) principles to model students, courses, and schedules
Designed relational SQL schema with normalized tables for efficient data storage and retrieval
Managed data persistence via JDBC with prepared statements and transaction management
Implemented CRUD operations and complex queries for schedule optimization

Flashly | Spaced Repetition Flashcard System

Status: Published | 2024-2025
Technologies: TypeScript, React, FSRS Algorithm, OpenAI/Anthropic/Google Gemini APIs, Jest
Repository: GitHub
Overview:
Developed and published comprehensive spaced repetition flashcard plugin for Obsidian, transforming notes into intelligent study materials using advanced learning algorithms.
Key Features:

Multiple Flashcard Formats: Implemented support for Q::A inline syntax, ?? multi-line format, {cloze} deletion, and header-based cards
FSRS Algorithm Integration: Leveraged Free Spaced Repetition Scheduler (FSRS) algorithm, reducing review volume by ~20% compared to traditional SM-2 methods
Intelligent Deck Organization: Built automatic deck creation from filenames, custom naming via frontmatter, and subtag-based hierarchies
Flashcard Browser: Created comprehensive deck overview with searching, filtering, and direct study sessions with progress tracking
AI-Powered Quiz Generation: Integrated multiple LLM providers (OpenAI, Anthropic, Google Gemini) for automated question generation alongside traditional rule-based quizzes
AnkiConnect Integration: Enabled direct sync to Anki without CSV export workflows
Interactive Tutorial: Designed guided onboarding for card creation, vault scanning, and review workflows
Technical Implementation:
Built with TypeScript/JavaScript and React for UI components
Implemented Jest testing framework for quality assurance
Created customizable parsing tags, scheduler algorithms, and daily study limits in plugin settings
Developed keyboard shortcuts for fast review navigation
Impact:
Published to Obsidian Community Plugins marketplace for one-click installation
Licensed under 0BSD (public domain) for maximum accessibility
77 commits demonstrating iterative development and refinement

Additional Active Projects

Apple Reminders Bridge | Obsidian Plugin (Research Phase)

Designing two-way synchronization between Obsidian and Apple Reminders via AppleScript
Planning features: custom code block rendering, checkbox toggling, filtering by list/due date, reminder creation
Technologies: TypeScript, AppleScript, node-osascript, Obsidian Plugin API
Portfolio Website | Personal Portfolio (Early Stage - 5% Complete)
Designing retro TV-themed portfolio with remote control aesthetic
Planning AI chatbot integration for project Q&A
Technologies: React/Next.js, modern CSS frameworks

Technical Skills

Programming Languages

Proficient: Python, TypeScript/JavaScript, SQL, React, R
Experienced: Java, Kusto Query Language (KQL)
Familiar: AppleScript, Functional Programming (Lambda Calculus, Currying)

Big Data & Analytics

Frameworks: Apache Spark (RDD, DataFrames, MLlib), PySpark, MapReduce
Data Processing: Spark SQL, Window Functions, Lazy Evaluation, Data Partitioning
Analytics Tools: Pandas, NumPy, Statistical Modeling
Query Languages: SQL, Spark SQL, Kusto Query Language (KQL)

Machine Learning & AI

ML Frameworks: Spark MLlib, Scikit-learn
Models: Gradient Boosting (GBT), Generalized Linear Models (GLM), Random Forest, Regression, Clustering, PCA
AI/RAG Systems: LangChain, LlamaIndex, Retrieval-Augmented Generation
Vector Databases: FAISS, ChromaDB, Weaviate, pgvector
Feature Engineering: Temporal analysis, statistical features, domain-specific metrics (36+ engineered features)

Cloud Platforms & Infrastructure

Azure: Data Factory, Storage, DevOps, Policies, Key Vault, NSGs, Monitor, App Service, Active Directory
AWS: Foundational knowledge (Cloud Practitioner certified)
Infrastructure as Code: Terraform, ARM Templates, PowerShell scripting
DevOps: Azure DevOps, CI/CD pipelines, GitHub Actions, Docker, Git version control
Distributed Computing: Slurm, Penn State ICDS Roar cluster

Databases & Storage

Relational: PostgreSQL, Apache Derby, SQL Server
Cloud Databases: Supabase (PostgreSQL with RLS)
Vector Databases: FAISS, ChromaDB, Weaviate, pgvector
Data Formats: Parquet, JSON, YAML, CSV
Database Design: Schema modeling, normalization, indexing, Row-Level Security policies

Web Development

Frontend: React, Next.js, HTML/CSS, Mobile-responsive design, Progressive Web Apps
Backend: FastAPI, Flask, RESTful API design, Microservices architecture
Authentication: Supabase Auth, OAuth, SAML, Single Sign-On (SSO), Row-Level Security (RLS)
API Integration: REST APIs, third-party API integration (Robinhood, Schwab, Google Trends, Reddit)

Development Tools & Practices

Version Control: Git, GitHub (78+ merged PRs on production project)
Testing: Vitest, unit testing, integration testing, CI/CD testing
Build Tools: esbuild, Webpack, npm/yarn
Plugin Development: Obsidian Plugin API, cross-platform development
Documentation: Technical writing, API documentation, user guides

Data Collection & Processing

Web Scraping: BeautifulSoup, requests, Firecrawl
APIs: REST API integration, rate limiting, authentication
OCR: Document text extraction, field identification
Natural Language Processing: Sentiment analysis, text classification

Security & Compliance

Encryption: End-to-end encryption, KMS, Azure Key Vault
Access Control: Role-based access control (RBAC), Row-Level Security (RLS)
PII Protection: Data anonymization, secure credential management
Network Security: NSG rules, VPN, secure network segmentation
Compliance: Audit trails, SOC 2 considerations

Specializations

Cloud Data Pipelines: ETL design, data engineering, workflow orchestration
Distributed Computing: Spark, MapReduce, cluster computing, scalability optimization
RAG Systems: Document ingestion, chunking, embedding, retrieval, LLM integration
Plugin Development: Obsidian ecosystem, modular architecture, backward compatibility
Multilingual Software: Localization, internationalization (i18n)
Mobile Development: Cross-platform UI, iOS/Android compatibility

Domain Knowledge

Big Data Processing: Distributed systems, lazy evaluation, data partitioning, query optimization
Programming Language Theory: Lexical analysis, parsing, type systems, lambda calculus, memory management, OOP
Database Design: Schema modeling, normalization, RLS policies, transaction management
Network Administration: Security groups, VPN, DNS, network diagnostics
Sports Analytics: Demand forecasting, entertainment industry metrics, betting odds integration
Statistics: Probability theory, statistical inference, hypothesis testing, clustering, dimensionality reduction (PCA)
Cloud Architecture: Microservices, serverless, infrastructure as code, cost optimization

Professional Attributes

Open-Source Contributions

Active maintainer with 78+ merged pull requests on production plugin with 5,000+ users
Published plugin to official Obsidian community marketplace
Responsive community support and issue resolution
Collaborative development with distributed teams

System Design & Architecture

Proven ability to architect full-stack applications from concept to deployment
Experience with microservices, distributed systems, and cloud-native architectures
Balanced approach to technical complexity and practical implementation
Focus on scalability, maintainability, and user experience

Problem Solving

Demonstrated expertise debugging complex issues:
- Mobile UI compatibility (iOS, Samsung)
- Timezone handling for global users
- Distributed computing optimization
- Schema harmonization and data migration
- Performance optimization for large datasets
Systematic approach to root cause analysis and solution implementation

Technical Communication

Extensive documentation of complex topics (lambda calculus, distributed systems, programming language theory)
Clear API documentation and user guides
Ability to explain technical concepts to non-technical stakeholders
Active participation in code reviews and knowledge sharing

Rapid Learning

Quick adoption of diverse technologies across big data, cloud platforms, AI/ML frameworks, and web development
Self-directed learning through coursework, projects, and hands-on experience
Ability to deep-dive into new domains (sports analytics, tax systems, creative writing tools)

User-Centric Development

Balance of deep technical implementation with focus on user experience
Accessibility considerations in UI development
Responsive to user feedback and iterative improvement
Mobile-first and cross-platform thinking

Academic Interests & Continuous Learning

Current Studies:

Big Data Analytics: MapReduce, Spark optimization, distributed computing patterns
Programming Language Design: Type systems, compiler construction, formal verification
Statistics: Advanced probability, statistical inference, machine learning theory
Cloud Architecture: Cost optimization, security best practices, infrastructure design
Study Materials:
Probability and Statistical Inference (10th Edition)
Programming Language Pragmatics (4th Edition)
Comprehensive course notes on lambda calculus, parsing, memory management
Distributed computing and big data processing documentation

Career Focus & Interests

Primary Focus Areas:

Data Science and Engineering
Cloud Data Platforms and Infrastructure
Full-Stack Product Development
Open-Source Software Development
Industries of Interest:
Big Data Analytics and Business Intelligence
Cloud Computing and Infrastructure
AI/ML Applications and Platforms
Developer Tools and Productivity Software
Professional Goals:
Build scalable data pipelines and analytics systems
Contribute to impactful open-source projects
Design and implement production-ready AI/ML solutions
Develop user-facing applications with strong technical foundations

Additional Information

GitHub: github.com/Maws7140
Location: State College, PA (Open to relocation/remote opportunities)
Work Authorization: Authorized to work in the United States
Website: KING MAWS | Retro Portfolio

References and project portfolios available upon request