Python for Data Science

Why Python:

Simple(Syntax as Simple English), Open Source , Libraries supported by large and Active Community ,

Table of Contents

1. Operators

2. Variables and Variable Naming Conventions

3. Data Types in Python

4. Conditional Statements

5. Looping Statements

6. Functions

7. Packages in Python

8. Hands-on with Pandas Library

1. Operators: Symbolic Representations of Mathematical Tasks

  • Arithmetic Operator : +,-,*,-,% ,//, **
  • Conditional Operator : Returns True/ False with : <, <=, ==, >=, >, !=
  • Logical Operator : and, or, not
Python Commands Output
3 + 5 / 45-54 * 4 – 212.88888888888889
# "DataScientist"+3 Gives Error

"DataScientist "*3

‘DataScientist DataScientist DataScientist ‘
"DataScientist "+ "3 " ‘DataScientist 3 ‘
45>43 True
56<34 False
34*34 > 34*34 False
34*34 == 34**34 False
0 and 3 0
3 and 0 0
3 and 5 # Gives 2nd value (in Python) 5
0 or 3 3
3 or 5 # Returns 5 3
True and False False
True or False True

2. Variables and Data Types

  • Variables are names bounded to objects
  • Case Sensitive, start with character or _underscore (not Number)
  • Data Int, Float, Bool, String, (IFBS)
a = 5

a

5
print(a) 5
A=4 # Case Insensitive

print(A,a)

4 5
a = 5

b = 7

a = b

print(a,b)

7 7
_a5 = 5

type(_a5)

int
b = "Data Scientist"

type(b)

str

3. Conditional Statements

  • If arrived home early: then cook, else order on swiggy!
  • If-else statements: Single Condition
if(condition):

statement1

else:

statement2

—-

if(time == late):

food = swiggy

else:

food = cook

if(condition1):

statement1

elif(condition2):

statement2

else:

statement3

Assume a variable x, print “positive” if x is greater than 0, ‘Zero’ if x is equal to 0 or “negative” if x is less than 0

x = -23432*-323

if(x == 0):

print("X is Zero")

elif(x > 0):

print("X is Positive")

else:

print("X is Negative")

# Take a variable X and print "Even" if the number is divisible by 2, otherwise print "Odd"

x = 9.3

if(x%2 == 0):

print("Given Number x: ", x, "is EVEN ")

else:

print("Given Number x: ", x, "is Odd ")

# Take a variable y and print "Grade A" if y is greater than 90, "Grade B"

# if y is greater than 60 but less than or equal to 90 and "Grade F" otherwise

y = 89.1

if(y > 90):

print("Grade A, Congratulations")

elif(y >60 and y<=90):

print("Grade B, All the best")

else:

print("Grade F, Long way to Go..!")

4. Looping Constructs

For Loop
for i in range(11,50):

print(i)

# For Loop to print all the numbers between 10 and 50
for i in range(11,50):

if(i%2 != 0):

print(i)

range(start, stop[, step]) -> range object

# For Loop to print all the ODD numbers between 10 and 50

But Another Option:

for i in range(11,50,2):

5. Functions

  • Reusable piece of code – created for Solving SPECIFIC Problem
def function_name(inpu_argument):

statement1;

statement2;

return some_var;

def area_circle(radius):

area = 3.14*r*r

return area

def compare(a,b):

if(a>b):

greater=a

else:

greater=b

return greater

compare(10,50)

>50

6. Python Data Structure

  • Existing DataType: int,float, bool, str – can be stored in Single FORMAT only
  • 2 Data Structure:
  • Lists :– With Sequence : [1,’Python’, 2, ‘is’, 3, ‘Awesome’]
  • Dictionaries :- Without Sequence: {‘Ramesh’:150, ‘Sudesh’:160, ‘Suresh’:146}

7. Lists

  • Ordered Data Structure – with elements separated by comma – enclosed with Square Brackets
  • Extract Single Element: list[index#] # Index
  • Extract Sequence: List[0:4] # Starts with 0 and Stops at 3 (not 4-1)
  • List Functions: append(), extend([another_list]), remove() , del list[index#],
  • Accessing List: for i in list_name: print(list)
# Creating a List

marks=[1,2,3,4,5,6,7,8,9,10]

marks

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
marks[5] # Index starts at 0 6
# Get Elements till 6

marks[0:6]

[1, 2, 3, 4, 5, 6]
# Adding an element

marks.append(11)

marks

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

marks.extend([12,13])

marks

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 12, 13]
marks.append([14,15])

marks

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 12, 13, [14, 15]]
# Deleting elements in List

marks.remove([14,15])

del marks[0]

# Deleting by Actual

# Deleting by Index Value

# Accessing List & Operating on Elements with For

for mark in marks:

print(mark*100)

200

300

400

8. Dictionaries

  • Un-Ordered Data Structure – Elements are stored in {Key : Value} pairs
  • Add Elements to dictionary with: update() function ; Delete with: del dict(‘Key’)
marks={‘history’:45, ‘Geography’:54, ‘Hindi’:56}

marks

{‘Geography’: 54, ‘Hindi’: 56, ‘history’: 45}
marks[‘Geography’] 54
marks[‘english’] = 47 // Adding Elements

marks

{‘Geography’: 54, ‘Hindi’: 56, ‘english’: 47, ‘history’: 45}
marks.update({‘Chemistry’:89, ‘Physics’:98})

marks

{‘Chemistry’: 89,

‘Geography’: 54,

‘Hindi’: 56,

‘Physics’: 98,

‘english’: 47,

‘history’: 45}

del marks[‘Hindi’]

marks

{‘Chemistry’: 89, ‘Geography’: 54, ‘Physics’: 98, ‘english’: 47, ‘history’: 45}

9. Understanding Standard Libraries in Python

  • Built-In Functions provided by ‘Standard Library’
  • Module: Single Python File/Class,
  • Package: Bundle of Modules
  • format: from Package.Module import Function (or)
  • from Package import Module => use dot operator for accessing function
  • Module.function(x,y)

Data Frames

Reading CSV in Python – Introduction to Pandas

  • Pandas: Python Data Analysis Toolkit for READING, FILTERING, MANIPULATING, VISUALIZING and EXPORTING Data
  • Different Varieties of Data: CSV, JSON, HTML, Excel …
import pandas as pd
df = pd.read_csv("data.csv") # Read CSV
df = pd.read_excel("data.csv") # Read Excel

Data Frames & its Operations

  • Data Frame is similar to Excel Tabular datasheet
  • (But) Row Index starts from 0
  • Some Data Frame(df) functions: df.shape(), df.head()/tail(), df.columns, df[“Column”]

Initial Understanding of Data Frame

df.shape (891,12) # 891 Rows and 12 Columns
df.head() df.tail()
df.columns # Display all Column Name
df.info()
df.describe().transpose()
df[‘Embarked’] # Get All values of Single Column
df[[‘Embarked’,’Age’]] # Inside List of Multiple Columns

Indexing a Data Frame

df.iloc[:5] # Selecting ROWs by their positions

Range of Rows : From 0 to 4th Index (5-1)

(If comma not available, then all columns)

df.iloc[:,:2]

# Select All Rows & Columns from 0 to 1 (2-1)

[Rows, Columns] =>

[Start_Row : End_Row , Start_Column : End_Col]

df[df[‘Embarked’]==’C’] # Display All Rows only with EMBARKED=C
df.iloc[:,-2:] Accessing Last 2 Columns with iloc
df.iloc[-10:,:2] # Access last 10 rows and first two columns of the index dataframe
df.iloc[24,4] & also

df.iloc[24:25,4:5]

# Access Element of 25th Row – 5th Column
df.loc[:,[‘Dependents’,’Education’]] # Selecting these 2 Columns only

Leave a comment