Welcome to my fullstack data concierge with tools, resources and stories. From sourcing → modeling → system design → storage | production.
My ProductsLearn how to convert PDF files to images using Python and pdf2image, a crucial preprocessing step for NLP tasks dealing with PDF documents.
A comprehensive guide to setting up and managing data workflows using Apache Airflow for efficient data processing and ETL operations.
Get started with NLP using spaCy, covering text processing, named entity recognition, and building a simple text classification model.
Analyzed lockdown policy effectiveness using GCP, Apache Kafka, and Spark. The system combined real-time and batch processing for advanced analytics.
Analyzed U.S. alternative fuel stations data to uncover trends in EV charging infrastructure and forecasted future growth patterns.
Built a web scraping pipeline using Scrapy and Requests to collect and process air quality data from EPA AirNow for geospatial analysis.
A robust document processing system for handling various file formats, including PDFs, with text extraction and analysis capabilities.
A collection of data engineering exercises and solutions, covering ETL processes, data pipelines, and big data technologies.
Implementation of various time series forecasting models including ARIMA, Prophet, and deep learning approaches for predictive analysis.