Practice
Resources
Contests
Online IDE
New
Free Mock
Events New Scaler
Practice
Improve your coding skills with our resources
Contests
Compete in popular contests with top coders
logo
Events
Attend free live masterclass hosted by top tech professionals
New
Scaler
Explore Offerings by SCALER

Data Analysis

Last Updated: Jan 10, 2022
Go to Problems
Contents

Pandas

Pandas is an open-source Python library that is used for data handling tasks for machine learning and data science objectives.

Firstly create an alias of pandas let’s use pd here.

One most frequently used functionality of Pandas is to read a data file in the format of csv, json, SQL table, or a JSON file.

For eg. we can read a csv file using the following syntax:

data_frame=pd.read_csv(“location_of_the_file”) 

 

Series are one dimensional labeled Pandas arrays that can contain any kind of data, even NaNs.

import pandas as pd
import numpy as np
lectures = pd.Series(["Mathematics","Chemistry","Physics","History","Geography","German"]*3)
grades  = pd.Series([90,54,77,22,25]*3)
classes = pd.Series(['A','B','C']*6)
credits = pd.Series(['1','2','6']*6)
names=np.array([["John"]*6,["Dan"]*6,["Zac"]*6]).flatten()
retake=np.array(['Yes','No']*9)
df=pd.DataFrame({"Names":names,"Lectures": lectures, "Grades": grades*3, "Classes":classes,"Credits": credits, "Retake":retake})
print(df.to_string(index=False)) # code to show the dataframe without index column

 

print(df.head(7)) 

head() is a function using which we can retrieve the first rows of the dataframe. By default, it retrieves the first five rows but we can retrieve as many front (first) rows after passing them as arguments.

 

  • DataFrames are a lot similar to data files like an Excel csv file or an SQL table.
  • Other than reading from a file a dataframe can also be created through a series in Pandas.
  • Pandas provides DataFrame Slicing using “loc” and “iloc” functions.

 

print(df.loc[:10,['Names','Lectures']])   #here we are retrieving first ten rows from which only Names and Lectures variables are selected. 

In the case of iloc the arguments passed need to be integers like in iloc Names and lectures won’t work but we will have to pass their indices like 0,1 in the list to get the output otherwise it’ll give an error.

 

print(df.iloc[5:10,1:3]) #here we have retrieved the columns from index 1 to 3 (Lectures and Grades) for rows of index 5 to 10. 

 

Let’s say John's parents want to learn more about their son’s performance at the school. They want to see their son’s lectures, grades for these lectures, the number of credits earned, and finally if their son will need to take a retake exam. We can simply slice the DataFrame created with the grades.csv file (which has all the student’s academic records), and extract the necessary information we need. For example:

Grades = df.loc[(df["Names"] == "John"), ["Lectures","Grades","Credits","Retake"]] 




In the above code, we are just retrieving those rows in which the “Name” variable is equal to the mentioned name.

You can use the loc and iloc functions to access rows in a Pandas DataFrame. 

print(df.iloc[0]) 

This row will just return the info about the first row of the dataframe.

The Pandas groupby function allows you to split data into groups based on some criteria. Pandas DataFrames can be split on either axis, ie., row or column.

print(df.groupby(["Lectures","Names"]).first()) 

Using the above code, the data can be divided into groups using Lectures and Names attributes where the division would be according to the Lectures at level1 then Names at level2.Example





We can even iterate on grouped objects as we have done in the code below, according to the  Classes.

for key, item in grouped_obj:
    if(key=='A'):
        print("Key is: " + str(key))
        print(str(item), "\n\n")

One can also save data in a CSV in the local directory using Pandas, using the below code.

df.to_csv('file1.csv') # here file1 is the name of the file and to_csv is the function used to save the CSV. 

 

Some of the important uses of Pandas are:

  • Data cleansing
  • Data fill
  • Data normalization
  • Merges and joins
  • Data visualization
  • Statistical analysis
  • Data inspection
  • Loading and saving data

Video Courses
By

View All Courses
Excel at your interview with Masterclasses Know More
Certificate included
What will you Learn?
Free Mock Assessment
Fill up the details for personalised experience.
Phone Number *
OTP will be sent to this number for verification
+65 *
+65
Change Number
Graduation Year *
Graduation Year *
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
*Enter the expected year of graduation if you're student
Current Employer
Company Name
College you graduated from
College/University Name
Job Title
Job Title
Engineering Leadership
Software Development Engineer (Backend)
Software Development Engineer (Frontend)
Software Development Engineer (Full Stack)
Data Scientist
Android Engineer
iOS Engineer
Devops Engineer
Support Engineer
Research Engineer
Engineering Intern
QA Engineer
Co-founder
SDET
Product Manager
Product Designer
Backend Architect
Program Manager
Release Engineer
Security Leadership
Database Administrator
Data Analyst
Data Engineer
Non Coder
Other
Please verify your phone number
Edit
Resend OTP
By clicking on Start Test, I agree to be contacted by Scaler in the future.
Already have an account? Log in
Free Mock Assessment
Instructions from Interviewbit
Start Test