Oluwaferanmi Wale-Bogunjoko
Data Scientist | Cloud Data Engineer | Full-Stack Developer
+12403672085 · feranwb@gmail.com · Github.com/Maws7140 · State College, PA
Professional Summary
Results-driven Data Scientist and Cloud Engineer with expertise in big data analytics, cloud infrastructure, and full-stack development. Proven track record of architecting complex systems, contributing to open-source projects (78+ merged PRs), and implementing scalable data pipelines. Specializing in Apache Spark, Azure cloud services, and machine learning with hands-on experience processing millions of records and building production-ready applications.
Education
Pennsylvania State University | State College, PA
Bachelor of Science in Data Science | August 2022 – Present
Relevant Coursework:
- DS 410: Big Data Analytics (MapReduce, Spark, RDD, Clustering, PCA)
- CMPSC 461: Programming Language Design & Implementation (Lexical Analysis, Parsing, Lambda Calculus, Type Systems, Memory Management)
- DS 220: Data Science Foundations
- CMPSC 360: Discrete Mathematics for Computer Science
- CMPSC 221: Data Structures & Algorithms
- STAT 414,415, 380, 184: Advanced Statistics, Probability & Statistical Inference
- MATH 230: Calculus & Analytical Geometry
Certifications
- Azure Fundamentals (AZ-900) — March 2025
- AWS Certified Cloud Practitioner — July 2025
Professional Experience
Cloud & Data Engineering Intern
Covenant Global Services (CGS) | May 2025 – August 2025 | Maryland
Azure Cloud Infrastructure:
- Automated provisioning of Azure Storage Accounts and Azure Data Factory using infrastructure-as-code with Azure DevOps pipelines
- Created and configured self-hosted integration runtimes for seamless data orchestration across cloud and on-premises systems
- Implemented Azure Policies and managed enterprise-wide user access controls via Active Directory
- Managed application secrets and credentials using Azure Key Vault with automated rotation policies
Networking & Security: - Codified network security by deploying Network Security Group (NSG) rules for secure inter-server communication on port 22
- Performed network diagnostics using NS Lookup for troubleshooting connectivity and DNS resolution issues
- Secured platform access by integrating Informatica with Azure AD using SAML-based Single Sign-On (SSO)
- Extended and managed expiring application credentials across multiple enterprise systems
Data Analytics & Optimization: - Leveraged Kusto Query Language (KQL) for advanced analytics on Azure cost management datasets, identifying opportunities for resource optimization
- Worked with Azure Monitor to provide actionable right-sizing recommendations, reducing infrastructure costs
- Analyzed cloud resource usage patterns to improve efficiency and cost-effectiveness
Technologies: Azure (Data Factory, Storage, DevOps, Policies, Key Vault, NSGs, Monitor), Kusto Query Language (KQL), PowerShell, ARM Templates, Terraform, Active Directory, Informatica
Python Developer
Freelance (Upwork) | June 2025 – Present | Remote
- Built a multi-exchange trading bot integrating Robinhood and Schwab APIs with automated order management, risk controls, and real-time reporting dashboards
- Implemented robust error handling and failover mechanisms for high-reliability trading operations
- Developed real-time monitoring and alerting systems for trading activities
Technical Projects
UFC Sell-Through Prediction & Card Optimization | Big Data Analytics
Status: Completed |Fall 2025
Technologies: Apache Spark, PySpark, Spark MLlib, Spark SQL, Python, Google Trends API, Reddit API, Slurm
Overview:
Developed a comprehensive big data analytics system to predict UFC ticket sales by analyzing fighter impact on event attendance, processing 2.8 million+ records across 430MB of raw data.
Data Engineering:
- Built end-to-end ETL pipelines using PySpark DataFrames with advanced SQL and window functions for feature engineering
- Integrated multiple data sources:
- UFC event schedules and fighter statistics
- Google Trends data (2.3M rows) for market interest metrics
- Reddit sentiment analysis (380K rows) for fan engagement
- Betting odds and venue capacity data
- Optimized data storage using Parquet format for efficient distributed querying and processing
- Implemented schema harmonization across disparate data sources
Machine Learning: - Engineered 36+ features including:
- Fighter performance metrics (win percentage, finish rate, average fight time)
- Experience factors (days since last bout, career longevity)
- Style contrast indicators (striker vs wrestler matchups)
- Rivalry and rematch indicators
- Market interest and sentiment metrics
- Trained GBTRegressor and GLM models with cross-validated hyperparameter tuning using Spark MLlib
- Implemented feature importance analysis to identify key drivers of ticket sales
- Calibrated predictions by venue capacity for realistic sell-through estimates
Optimization & Deployment: - Developed card optimizer using greedy algorithm with 2-opt swaps to maximize predicted attendance
- Conducted scalability analysis with 1/2/4 executor benchmarking on Penn State ICDS Roar cluster
- Deployed on distributed computing cluster via Slurm with Spark Adaptive Query Execution across 2 nodes
Deliverables: - Predictive model with comprehensive feature importance analysis
- Venue capacity calibration system
- Card optimization algorithm for fight placement
Obsidian Storyteller Suite | Open-Source Plugin Development
Status: Published | Spring 2025
Technologies: TypeScript, JavaScript, Obsidian Plugin API, vis-timeline, esbuild, Vitest, CI/CD
Repository: GitHub
Overview:
Core contributor to comprehensive dashboard plugin for writers, published to Obsidian community plugin store with 10,000+ downloads. Merged 78+ pull requests (PR #38 through #78+), demonstrating sustained open-source contribution and collaborative development.
Feature Development:
- Architected modular system for character management, location tracking, event/scene organization, and chapter management
- Built reference library system for language creation, prophecies, and creative inspiration with multi-book support
- Engineered portable data persistence layer using YAML and Markdown for cross-platform compatibility
- Implemented interactive timeline visualization with custom calendar support:
- BCE/CE date parsing with custom calendar systems
- Custom month and day configurations
- Event timeline rendering with drag-and-drop functionality
Technical Achievements:
- Developed multilingual support for Chinese, English, and other languages with proper translation display
- Resolved critical mobile UI bugs on iOS and Samsung devices, improving cross-platform reliability
- Fixed complex issues including:
- Timeline gridlines and rendering bugs
- Timezone handling for global users
- Fuzzy matching for entity searches
- Schema harmonization across plugin versions
- Maintained backward compatibility across major version updates
- Established modern development workflow with Vitest for unit testing and esbuild for efficient bundling
Impact: - Published to official Obsidian community plugin marketplace
- Actively maintained with ongoing bug fixes and feature enhancements
- Enhanced visual hierarchy and user experience based on community feedback
- Built engaged user community with responsive support
Study Group | AI-Powered Study Hub (Hackathon Project)
Status: Completed |Fall 2024
Technologies: React, FastAPI, Python, Supabase, PostgreSQL, LangChain, LlamaIndex, FAISS, ChromaDB
Overview:
Designed and implemented intelligent, private study hub for college classes with AI chatbot assistance, enabling secure student collaboration while maintaining data privacy.
Full-Stack Architecture:
- Frontend: Built responsive user interface with React for student and professor dashboards
- Backend: Developed microservices architecture using FastAPI (Python) with RESTful API endpoints
- Database: Implemented Supabase (PostgreSQL) with Row-Level Security (RLS) policies for multi-tenant data isolation
- AI/ML Pipeline: Integrated LangChain/LlamaIndex for Retrieval-Augmented Generation (RAG)
- Vector Store: Deployed FAISS and ChromaDB for semantic search on class documents
- Authentication: Implemented secure authentication and authorization using Supabase Auth
Key Features: - Student signup system with unique class codes and role-based access control
- Secure document upload and management with cloud storage integration (Supabase Storage)
- AI chatbot indexed exclusively on class-specific documents, preventing cross-contamination between classes
- Professor control panel with customizable guardrails for AI responses
- Enterprise-grade security with encryption, RLS policies, and access controls
Technical Implementation: - Designed database schemas with normalized tables and RLS policies for data isolation
- Built RAG pipeline with document ingestion, chunking, embedding, and retrieval
- Implemented real-time updates and notifications for collaborative features
- Created comprehensive API documentation and testing suite
Impact: - Production-ready system with finalized architecture and integrated AI services
- Demonstrated ability to rapidly prototype and build complex full-stack applications
UFC Fight Outcome Prediction | Full-Stack ML Application
Status: Completed | Spring 2024
Technologies: Python, Scikit-learn, Flask, Azure App Service, GitHub Actions
- Developed full-stack web application achieving 78% cross-validated accuracy using dual-model ensemble classifier
- Designed bias-elimination system using confidence-weighted averages from two distinct scikit-learn models (Random Forest, Gradient Boosting)
- Exposed prediction engine via RESTful Flask API with JSON endpoints for predictions
- Automated deployment to Azure App Service via GitHub Actions CI/CD pipeline
- Implemented comprehensive testing suite for model validation and API testing
Java Course Scheduler | Data Management Application
Status: Completed | Fall 2022
Technologies: Java, SQL, Apache Derby, JDBC
- Architected data management application applying Object-Oriented Programming (OOP) principles to model students, courses, and schedules
- Designed relational SQL schema with normalized tables for efficient data storage and retrieval
- Managed data persistence via JDBC with prepared statements and transaction management
- Implemented CRUD operations and complex queries for schedule optimization
Flashly | Spaced Repetition Flashcard System
Status: Published | 2024-2025
Technologies: TypeScript, React, FSRS Algorithm, OpenAI/Anthropic/Google Gemini APIs, Jest
Repository: GitHub
Overview:
Developed and published comprehensive spaced repetition flashcard plugin for Obsidian, transforming notes into intelligent study materials using advanced learning algorithms.
Key Features:
- Multiple Flashcard Formats: Implemented support for Q::A inline syntax, ?? multi-line format, {cloze} deletion, and header-based cards
- FSRS Algorithm Integration: Leveraged Free Spaced Repetition Scheduler (FSRS) algorithm, reducing review volume by ~20% compared to traditional SM-2 methods
- Intelligent Deck Organization: Built automatic deck creation from filenames, custom naming via frontmatter, and subtag-based hierarchies
- Flashcard Browser: Created comprehensive deck overview with searching, filtering, and direct study sessions with progress tracking
- AI-Powered Quiz Generation: Integrated multiple LLM providers (OpenAI, Anthropic, Google Gemini) for automated question generation alongside traditional rule-based quizzes
- AnkiConnect Integration: Enabled direct sync to Anki without CSV export workflows
- Interactive Tutorial: Designed guided onboarding for card creation, vault scanning, and review workflows
Technical Implementation: - Built with TypeScript/JavaScript and React for UI components
- Implemented Jest testing framework for quality assurance
- Created customizable parsing tags, scheduler algorithms, and daily study limits in plugin settings
- Developed keyboard shortcuts for fast review navigation
Impact: - Published to Obsidian Community Plugins marketplace for one-click installation
- Licensed under 0BSD (public domain) for maximum accessibility
- 77 commits demonstrating iterative development and refinement
Additional Active Projects
Apple Reminders Bridge | Obsidian Plugin (Research Phase)
- Designing two-way synchronization between Obsidian and Apple Reminders via AppleScript
- Planning features: custom code block rendering, checkbox toggling, filtering by list/due date, reminder creation
- Technologies: TypeScript, AppleScript, node-osascript, Obsidian Plugin API
Portfolio Website | Personal Portfolio (Early Stage - 5% Complete) - Designing retro TV-themed portfolio with remote control aesthetic
- Planning AI chatbot integration for project Q&A
- Technologies: React/Next.js, modern CSS frameworks
Technical Skills
Programming Languages
- Proficient: Python, TypeScript/JavaScript, SQL, React, R
- Experienced: Java, Kusto Query Language (KQL)
- Familiar: AppleScript, Functional Programming (Lambda Calculus, Currying)
Big Data & Analytics
- Frameworks: Apache Spark (RDD, DataFrames, MLlib), PySpark, MapReduce
- Data Processing: Spark SQL, Window Functions, Lazy Evaluation, Data Partitioning
- Analytics Tools: Pandas, NumPy, Statistical Modeling
- Query Languages: SQL, Spark SQL, Kusto Query Language (KQL)
Machine Learning & AI
- ML Frameworks: Spark MLlib, Scikit-learn
- Models: Gradient Boosting (GBT), Generalized Linear Models (GLM), Random Forest, Regression, Clustering, PCA
- AI/RAG Systems: LangChain, LlamaIndex, Retrieval-Augmented Generation
- Vector Databases: FAISS, ChromaDB, Weaviate, pgvector
- Feature Engineering: Temporal analysis, statistical features, domain-specific metrics (36+ engineered features)
Cloud Platforms & Infrastructure
- Azure: Data Factory, Storage, DevOps, Policies, Key Vault, NSGs, Monitor, App Service, Active Directory
- AWS: Foundational knowledge (Cloud Practitioner certified)
- Infrastructure as Code: Terraform, ARM Templates, PowerShell scripting
- DevOps: Azure DevOps, CI/CD pipelines, GitHub Actions, Docker, Git version control
- Distributed Computing: Slurm, Penn State ICDS Roar cluster
Databases & Storage
- Relational: PostgreSQL, Apache Derby, SQL Server
- Cloud Databases: Supabase (PostgreSQL with RLS)
- Vector Databases: FAISS, ChromaDB, Weaviate, pgvector
- Data Formats: Parquet, JSON, YAML, CSV
- Database Design: Schema modeling, normalization, indexing, Row-Level Security policies
Web Development
- Frontend: React, Next.js, HTML/CSS, Mobile-responsive design, Progressive Web Apps
- Backend: FastAPI, Flask, RESTful API design, Microservices architecture
- Authentication: Supabase Auth, OAuth, SAML, Single Sign-On (SSO), Row-Level Security (RLS)
- API Integration: REST APIs, third-party API integration (Robinhood, Schwab, Google Trends, Reddit)
Development Tools & Practices
- Version Control: Git, GitHub (78+ merged PRs on production project)
- Testing: Vitest, unit testing, integration testing, CI/CD testing
- Build Tools: esbuild, Webpack, npm/yarn
- Plugin Development: Obsidian Plugin API, cross-platform development
- Documentation: Technical writing, API documentation, user guides
Data Collection & Processing
- Web Scraping: BeautifulSoup, requests, Firecrawl
- APIs: REST API integration, rate limiting, authentication
- OCR: Document text extraction, field identification
- Natural Language Processing: Sentiment analysis, text classification
Security & Compliance
- Encryption: End-to-end encryption, KMS, Azure Key Vault
- Access Control: Role-based access control (RBAC), Row-Level Security (RLS)
- PII Protection: Data anonymization, secure credential management
- Network Security: NSG rules, VPN, secure network segmentation
- Compliance: Audit trails, SOC 2 considerations
Specializations
- Cloud Data Pipelines: ETL design, data engineering, workflow orchestration
- Distributed Computing: Spark, MapReduce, cluster computing, scalability optimization
- RAG Systems: Document ingestion, chunking, embedding, retrieval, LLM integration
- Plugin Development: Obsidian ecosystem, modular architecture, backward compatibility
- Multilingual Software: Localization, internationalization (i18n)
- Mobile Development: Cross-platform UI, iOS/Android compatibility
Domain Knowledge
- Big Data Processing: Distributed systems, lazy evaluation, data partitioning, query optimization
- Programming Language Theory: Lexical analysis, parsing, type systems, lambda calculus, memory management, OOP
- Database Design: Schema modeling, normalization, RLS policies, transaction management
- Network Administration: Security groups, VPN, DNS, network diagnostics
- Sports Analytics: Demand forecasting, entertainment industry metrics, betting odds integration
- Statistics: Probability theory, statistical inference, hypothesis testing, clustering, dimensionality reduction (PCA)
- Cloud Architecture: Microservices, serverless, infrastructure as code, cost optimization
Professional Attributes
Open-Source Contributions
- Active maintainer with 78+ merged pull requests on production plugin with 5,000+ users
- Published plugin to official Obsidian community marketplace
- Responsive community support and issue resolution
- Collaborative development with distributed teams
System Design & Architecture
- Proven ability to architect full-stack applications from concept to deployment
- Experience with microservices, distributed systems, and cloud-native architectures
- Balanced approach to technical complexity and practical implementation
- Focus on scalability, maintainability, and user experience
Problem Solving
- Demonstrated expertise debugging complex issues:
- Mobile UI compatibility (iOS, Samsung)
- Timezone handling for global users
- Distributed computing optimization
- Schema harmonization and data migration
- Performance optimization for large datasets
- Systematic approach to root cause analysis and solution implementation
Technical Communication
- Extensive documentation of complex topics (lambda calculus, distributed systems, programming language theory)
- Clear API documentation and user guides
- Ability to explain technical concepts to non-technical stakeholders
- Active participation in code reviews and knowledge sharing
Rapid Learning
- Quick adoption of diverse technologies across big data, cloud platforms, AI/ML frameworks, and web development
- Self-directed learning through coursework, projects, and hands-on experience
- Ability to deep-dive into new domains (sports analytics, tax systems, creative writing tools)
User-Centric Development
- Balance of deep technical implementation with focus on user experience
- Accessibility considerations in UI development
- Responsive to user feedback and iterative improvement
- Mobile-first and cross-platform thinking
Academic Interests & Continuous Learning
Current Studies:
- Big Data Analytics: MapReduce, Spark optimization, distributed computing patterns
- Programming Language Design: Type systems, compiler construction, formal verification
- Statistics: Advanced probability, statistical inference, machine learning theory
- Cloud Architecture: Cost optimization, security best practices, infrastructure design
Study Materials: - Probability and Statistical Inference (10th Edition)
- Programming Language Pragmatics (4th Edition)
- Comprehensive course notes on lambda calculus, parsing, memory management
- Distributed computing and big data processing documentation
Career Focus & Interests
Primary Focus Areas:
- Data Science and Engineering
- Cloud Data Platforms and Infrastructure
- Full-Stack Product Development
- Open-Source Software Development
Industries of Interest: - Big Data Analytics and Business Intelligence
- Cloud Computing and Infrastructure
- AI/ML Applications and Platforms
- Developer Tools and Productivity Software
Professional Goals: - Build scalable data pipelines and analytics systems
- Contribute to impactful open-source projects
- Design and implement production-ready AI/ML solutions
- Develop user-facing applications with strong technical foundations
Additional Information
GitHub: github.com/Maws7140
Location: State College, PA (Open to relocation/remote opportunities)
Work Authorization: Authorized to work in the United States
Website: KING MAWS | Retro Portfolio
References and project portfolios available upon request