Data Arc

Data Arc

Welcome to my fullstack data concierge with tools, resources and stories. From sourcing → modeling → system design → storage | production.

My Products Preview
Contact Me Try My Products

Latest Updates

From My Blog

Tutorial 5 min read

PDF to Image Conversion for Enhanced NLP Processing

Learn how to convert PDF files to images using Python and pdf2image, a crucial preprocessing step for NLP tasks dealing with PDF documents.

Apr 10, 2024 Read on Medium
Guide 7 min read

Building a Data Pipeline with Python and Apache Airflow

A comprehensive guide to setting up and managing data workflows using Apache Airflow for efficient data processing and ETL operations.

Mar 22, 2024 Read more
Tutorial 6 min read

Introduction to Natural Language Processing with spaCy

Get started with NLP using spaCy, covering text processing, named entity recognition, and building a simple text classification model.

Mar 5, 2024 Read more

Tech News & Updates

Featured Projects

COVID-19 Data Engineering

Analyzed lockdown policy effectiveness using GCP, Apache Kafka, and Spark. The system combined real-time and batch processing for advanced analytics.

GCP Kafka Spark
View on GitHub

EV Charging Trends

Analyzed U.S. alternative fuel stations data to uncover trends in EV charging infrastructure and forecasted future growth patterns.

Data Analysis Python Prophet
View on GitHub

Air Quality Data Pipeline

Built a web scraping pipeline using Scrapy and Requests to collect and process air quality data from EPA AirNow for geospatial analysis.

Scrapy Python GIS
View on GitHub

Document Processor

A robust document processing system for handling various file formats, including PDFs, with text extraction and analysis capabilities.

Python NLP PDF
View on GitHub

Data Engineering Practice

A collection of data engineering exercises and solutions, covering ETL processes, data pipelines, and big data technologies.

Python ETL Big Data
View on GitHub

Time Series Forecasting

Implementation of various time series forecasting models including ARIMA, Prophet, and deep learning approaches for predictive analysis.

Python Prophet ARIMA
View on GitHub