Why I Chose Data Science

SeemzQ
2 min readJan 7, 2021

After a decade of working in public health, I began to realize I needed more. I like a challenge and learning new skills. When I learned about data science, I was instantly intrigued because of all the new concepts and things I can do. One of them was data scrubbing. This is something you learn right away because it is one of the initial things you have to do to begin data analysis.

Data scrubbing is the process of cleaning data and getting it ready for analysis. The specific cleaning here involves getting rid of duplicate data, null values, bad formatted, and useless data, amongst others.

Python makes this process very easy with its programs like Pandas and Numpy. Below are examples from a Zilow dataset I worked on, looking at zipcodes from Charlotte, NC.

In place where you can create code, such as Jupyter Notebook, import the two programs:

>>> import pandas as pd

>>> import numpy as np

Then you want to load the data and view the columns.

df = pd.read_csv(“zillow_data.csv”)

df.head()

Since I am looking at Charlotte, NC, I will focus in on that by typing:

df = df.loc[df[‘City’] == ‘Charlotte’]

df.head()

I also dropped other columns that I did not find relevant to my analysis:

df.drop(‘RegionID’, axis=1, inplace=True)

df.drop(‘ROI’, axis=1, inplace=True)

df.drop(‘std’, axis=1, inplace=True)

df.drop(‘mean’, axis=1, inplace=True)

df.drop(‘Risk’, axis=1, inplace=True)

df

These are a few examples of data scrubbing. As you can see, dropping unnecessary features will make your analysis go more smoothly.

--

--