Identify the types of missing data to build unbiased ML models

Dealing with data where some attributes are missing is a very common challenge when building real-world ML models. It is therefore crucial to understand and study the missing data to learn whether the it has any correlation with other features or is just missing at random. Often, dropping missing data without proper analysis can lead to models that are biased in the real world.

This article is an adaptation of lectures from Course 2 of AI for medicine specialization by deeplearning.ai where the following case study has been discussed to imply the importance of proper analysis of missing data. …


Your first step to get started with data analysis.

Since 2010, when pandas first became open-sourced, it has matured quite into a beautiful and extensive library for data analysis. It is often used in conjunction with computational and statistical libraries like NumPy, scikit-learn, matplotlib etc.

In this article, I’ll walk you through the most common functions in pandas so you’re ready to do some exploratory data analysis.

pandas has 2 data structures that are repeatedly used over and over: Series and DataFrame

Series

Simply put, a series is a 1-d array of elements, but with an added feature: there’s an explicit index to address each element. To create a series…


Misinformed patients tend to make informed health decisions due to statistical illiteracy

There’s a virus that is spreading rapidly — perhaps, a pandemic. For your satisfaction, you decide to get yourself tested. You have no symptoms and you do not recall being in close proximity with a patient. The result tells that you have been tested positive. So now, with what certainty should you believe in the result? What is the probability of you actually having this disease?

Photo by Anna Shvets from Pexels

How often do patients or medical practitioners account for the accuracy of a certain test? Or do they even consider reasoning about the prevalence of a disease?

With this setup, some of you might…


Best practices for working with imbalanced datasets

Multi-label classification falls under the realm of Multi-task learning. It is crucial to point out that multi-label classification and multi-class classification problems are not the same.

Multi-label classification involves predicting zero or more class labels. Unlike normal classification tasks where class labels are mutually exclusive, multi-label classification requires specialized machine learning algorithms that support predicting multiple mutually non-exclusive classes or labels.

In this article, we’ll be discussing common key challenges faced when dealing with multi-label classification problems. …

Najia Gul

loves telling stories backed by data

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store