AI Logs

  • Logs on all things Machine Learning, Artificial intellgence and MLOps
  • By Raghava Dhanya

A Tale of a Suicidal Container

One fine day, I sat down to optimize the size of a Docker image. Like many times before, I opted for distroless images as my base, a choice I had made countless times before without a hitch. Distroless images, for the uninitiated, are peak minimalism, containing only the essential libraries and binaries required to run the application. Not only do they trim the fat off the image size, but they also mitigate the risk of CVEs lurking within....

February 21, 2024 · 4 min · Raghava Dhanya
Configurations

Keeping Configurations Sane with Pydantic Settings

Configurations are a crucial aspect of any software project. There are many sources of configurations, such as environment variables, configuration files, and command-line arguments. For file-based configurations in python, YAML and TOML (or INI) are popular choices. I prefer YAML, though it is not without flaws, some of which can be addressed by Pydantic anyway like type safety etc. Pydantic is a data validation library for Python. It is built on top of Python type hints and provides runtime validation of data....

November 14, 2023 · 5 min · Raghava Dhanya

Designing Machine Learning Systems for High Velocity Trading

As one of my works at Mu Sigma Labs, I was part of a research project on the High Velocity Time Series on early 2019. One of the goals was to create a high velocity trading app using Pair Trading. The Requisite terms Long and Short trades Long trades are buying a security. Short is selling a security even when you don’t own it. It generally means that you are borrowing someone’s securities and selling them in the hopes of buying it back for lower cost later and returning it and hence, making a profit....

June 20, 2023 · 7 min · Raghava Dhanya
A blue and yellow python wrapped around C++ logo

Python with a Dash of C++: Optimizing Recommendation Serving

Serving recommendation to 200+ millions of users for thousands of candidates with less than 100ms is hard but doing that in Python is harder. Why not add some compiled spice to it to make it faster? Using Cython you can add C++ components to your Python code. Isn’t all machine learning and statistics libraries already written in C and Cython to make them super fast? Yes. But there’s still some optimizations left on the table....

June 30, 2022 · 5 min · Raghava Dhanya
Golang's Gopher with tentacles

Go faster with Go: Golang for ML Serving

So the ask is to do 3 Million Predictions per second with as little resources as possible. Thankfully its one of the simpler model of Recommendation systems, Multi Armed Bandit(MAB). Multi Armed bandit usually involves sampling from distribution like Beta Distribution. That’s where the most time is spent. If we can concurrently do as many sampling as we can, we’ll use the resources well. Maximizing Resource utilization is the key to reducing overall resources needed for the model....

June 20, 2022 · 6 min · Raghava Dhanya
A BPMN pipeline containing tasks 'fetch data', 'load data & train model', 'approval from owner', 'deploy model' and 'email on failure to fetch data'. All script tasks are python

Showcase: BPMN Pipeline Platform

At Mu Sigma Labs, I led a significant project focused on BPMN-based analytics automation and pipeline orchestration. Using the open-source platform Activiti, I owned, developed, tested, and maintained a system serving about 3,000 internal users, handling critical reporting and data pipelines. Technologies Used The core technologies employed were: Backend: Java and Spring Boot Scripting: Python and R for analytics tasks Frontend: Angular for user interface Understanding BPMN Business Process Model and Notation (BPMN) provided the foundation for our automation approach....

June 20, 2022 · 2 min · Raghava Dhanya