PYT-102 Sprint Summary

Modified on Wed, 23 Jul at 1:56 PM

TABLE OF CONTENTS


PYT-102

Sprint summary

Getting Started

Topic 1: Recursion & Dictionaries


- Recursion

Recursion is when a function calls itself this technique is helpful for solving problems that can be broken down into similar sub-problems.

What is it?
A method where the solution to a problem depends on solving smaller instances of the same problem using self-calling functions.

Where is it used?
Used in tree/graph traversal, breaking down complex problems, computing factorials, Fibonacci series, and more.




How is it used?
• Define a base case to stop recursion
• Define a recursive case that calls the function again
• Example:

def factorial(n):
  if n == 0:
    return 1
  else:
    return n * factorial(n - 1)
• factorial(5) → 120


Below is the flow of above example:


--- Takeaways / best practices
• Always define a base case to avoid infinite loops
• Be mindful of the maximum recursion depth
• Recursive solutions can be elegant but may be less efficient than loops
• Use recursion when a problem fits naturally into divide-and-conquer format


- List Comprehensions

List comprehensions allow you to write cleaner, more concise loops that create new lists based on conditions or transformations.

What is it?
A compact way to create lists using a single line of code that includes a loop and an optional condition.





Where is it used?
Used for transforming lists, filtering data, and generating sequences efficiently.

How is it used?



• Basic example:

numbers = [1, 2, 3, 4]
squares = [x*x for x in numbers]


Output: [1, 4, 9, 16]


• With condition:

evens = [x for x in numbers if x % 2 == 0]


Output: [2, 4]

--- Takeaways / best practices
• Use for cleaner and more readable transformations
• Avoid complex logic inside list comprehensions to maintain clarity
• They are faster than traditional for loops for list operations
• You can nest them, but readability can suffer—keep it simple

- Nested Dictionaries

Nested dictionaries allow storing complex, hierarchical data like JSON - perfect for structured records or grouped values.



What is it?
A dictionary within another dictionary, allowing multi-level key access.

Where is it used?
Used to model structured data like user profiles, configuration data, or nested JSON responses.

How is it used?




--- Takeaways / best practices
• Use when representing hierarchical or grouped data
• Always check if keys exist to avoid KeyError
• Combine with loops to process nested data effectively
• Useful when working with JSON APIs and structured dataset

- Dictionaries Vs JSON

Python dictionaries and JSON look similar but have key differences understanding both is crucial when working with APIs or saving data.

What is it?
Dictionaries are Python objects, while JSON is a text-based format used for data exchange.




Where is it used?
Used in reading/writing API data, configuration files, and storing structured information.

How is it used?
• Python Dict to JSON:

import json
data = {"name": "Alice", "age": 25}
json_str = json.dumps(data)


• JSON to Python Dict:

parsed = json.loads(json_str)
parsed["age"] → 25



--- Takeaways / best practices
• Use json.dumps() to serialize Python dicts into JSON
• Use json.loads() to deserialize JSON back into Python dicts
• JSON keys must be strings; Python dicts allow other types
• Ideal for saving, sharing, or receiving data across systems


Topic 2: Python Modules & Errors Handling


-   Importing Python Modules

Modules allow you to reuse powerful, pre-built functionality-no need to reinvent the wheel!


Syntax:

import module_name 


What is it?
Modules are external or built-in Python files containing reusable code (functions, classes, constants).

Where is it used?
Used in all real-world analytics projects to access math, file handling, APIs, date/time, JSON, and more.

How is it used?

• Import an entire module:
import math
• Import specific functions:
from math import sqrt


• Rename for convenience:
import pandas as pd


--- Takeaways / best practices
• Use built-in modules before writing custom functions
• Use aliases for long module names (as)
• Keep imports at the top of the script for readability
• Check documentation to explore all useful functions within a module

-   math Module

The math module gives you access to advanced mathematical functions - perfect for calculations and data transformations.


Some methods and constants from the math module:

math.pi: A float value, 3.141592653589793, representing the mathematical constant PI.

math.sqrt(x): Returns a float value, representing the square root of the parameter x (x must be >= 0).

math.log(x, base: optional, e by default): Returns a float value, representing the natural logarithm of a number (if base is not mentioned), or the logarithm of number to base.

math.pow(x, y): Returns a float value, representing the value of x to the power of y.


What is it?
A built-in module offering functions like square roots, logarithms, and constants (like pi and e).

Where is it used?
Used in data normalization, scientific calculations, rounding, and statistical processing.

How is it used?
• Import module:

import math


• Example:

math.sqrt(16) → 4.0
math.floor(3.7) → 3
math.ceil(3.2) → 4
math.pi → 3.141592...



--- Takeaways / best practices
• Use math for precise mathematical operations
• Different from Python’s built-in round()
• math works only with numbers—avoid passing strings
• Great for preprocessing numeric features or formulas

-   datetime Module

Working with dates and times? The datetime module is your go-to for parsing, formatting, and comparing date values.

What is it?
A module for manipulating dates and times as objects.

Where is it used?
Used in time-series analysis, parsing timestamps, filtering data by date, or calculating date differences.

How is it used?



--- Takeaways / best practices
• Use correct format codes for parsing (e.g., %Y, %m, %d)
• Combine with Pandas for time-indexed data
• Always standardize time zones if comparing timestamps
• Helpful in filtering, grouping, and visualizing temporal data

-   json Module

JSON is everywhere from APIs to config files and the json module helps you work with it like a Python pro.

What is it?
A module that allows you to convert between JSON (string) and Python dictionaries.

Where is it used?
Used in web APIs, storing structured data, exporting results, and working with external services.

How is it used?



--- Takeaways / best practices
• Use dumps() to save Python data to JSON
• Use loads() to parse incoming JSON
• JSON only supports string keys and basic types
• Essential when reading API responses or saving data to disk


-   Regex and re Module

Regular expressions help you search, clean, and extract patterns from messy text—critical for preprocessing data.

What is it?
Regex (regular expressions) is a way to define patterns for string matching, supported via the re module.


Regular Expression - Metacharacters:

* : Matches 0 or more repetitions.

. : Matches any character except newline.

+ : Matches 1 or more repetitions.

{n} : Matches exactly n repetitions.


Regular Expression - Special Sequences:

\b : Matches a word boundary.

\d: Matches a digit (equivalent to [0-9]).


Regular Expression - Sets:

[A-Z] : Matches any uppercase letter.

[A-Z][a-z]: Matches an uppercase letter followed by a lowercase letter. 



Where is it used?
Used in log analysis, cleaning unstructured data, validating formats (like emails), and text parsing.

How is it used?




--- Takeaways / best practices
• Use r"" raw string format for regex patterns
• Regex is powerful but can be complex—test your pattern
• Use for extracting IDs, tags, numbers, or structured parts of text
• Mastering regex boosts your ability to clean real-world datasets

-   Exception Handing

Exception handling lets your code gracefully recover from errors like missing files or bad input, without crashing.

What is it?
A system to catch and respond to errors using try, except, and optionally finally.



Where is it used?
Used in file handling, user input, data conversions, and API calls—anywhere errors may occur.




Common Types of Errors to be handled:

ValueError: Raised when a function receives an argument of the right type but an inappropriate value.

TypeError: Raised when an operation or function is applied to an object of inappropriate type.

ZeroDivisionError: Raised when division or modulo by zero takes place.

IndexError: Raised when a sequence subscript is out of range.

KeyError: Raised when a dictionary key is not found. 


How is it used?
• Example:

try:
  num = int(input("Enter number: "))
except ValueError:
  print("Invalid input")
finally:
  print("Done")


Output:

Case 1: User enters a valid number (e.g., 10)

Input: 10 

Output:

Done


(The finally block runs after successful input, and nothing except is triggered.)

Case 2: User enters invalid input (e.g., "abc")

Input: abc 

Output:

Invalid input  

Done


(The except block catches the ValueError, and finally still executes.)


• Use except to handle specific error types (e.g., KeyError, IndexError)

--- Takeaways / best practices
• Always handle predictable errors (like user input or file issues)
• Avoid generic except: unless truly needed
• finally runs whether an error occurs or not—use for cleanup
• Use exception handling to make robust, user-friendly code


Topic 3: NumPy & Crio IDE


- Deep Copy vs Shallow Copy

Understanding how data is copied in Python is crucial when working with lists or complex data structures. A wrong copy type can lead to unexpected changes in your dataset.

What is it?
A shallow copy creates a new object but copies references to nested objects. A deep copy creates a new object and also copies all nested objects recursively.

Where is it used?
In data analytics, especially when manipulating large lists or structures like tables, dictionaries, or NumPy arrays where you don't want changes in one to affect the other.

How is it used?
• Use copy.copy() for a shallow copy
• Use copy.deepcopy() to duplicate both outer and inner elements
• NumPy uses .copy() for deep-like copying of arrays

Shallow Copy:


Deep Copy:



--- Takeaways / best practices
• Use deep copy when copying nested data that must remain unchanged
• Be cautious: shallow copy shares references and may cause data to change unexpectedly

- Introduction to Crio Jupyter IDE

Crio's Jupyter IDE is a browser-based coding environment that helps you test and learn Python interactively using notebooks.

What is it?
It’s a lightweight and beginner-friendly Python coding interface, perfect for writing and executing blocks of code and seeing real-time results.


Where is it used?
During Python sprints, exercises, and assignments in Crio’s learning platform—especially for learning Python for data tasks.

How is it used?
• Open your course notebook from the dashboard
• Write Python code in cells and run them individually
• Use Shift + Enter to execute a cell
• View outputs and errors right below each cell



--- Takeaways / best practices
• Ideal for experimenting with data line-by-line
• Helps organize code in a readable, testable format
• Use comments (#) and markdown cells to document code and logic

- List vs NumPy

Python lists are versatile, but NumPy arrays are powerful tools specifically designed for efficient numerical operations.

What is it?
A list is a flexible Python container. A NumPy array is a performance-optimized structure for large-scale numerical computations.

Where is it used?
Lists are suitable for general scripting; NumPy arrays are used for high-performance math, matrix operations, and data preprocessing.

How is it used?

import numpy as np

py_list = [1, 2, 3]
np_array = np.array([1, 2, 3])
print("List * 2:", py_list * 2)
print("Array * 2:", np_array * 2)


Output:

List * 2: [1, 2, 3, 1, 2, 3]
Array * 2: [2 4 6]






--- Takeaways / best practices
• NumPy arrays support fast, element-wise operations
• Prefer NumPy for large datasets and math-heavy processes
• Lists can't perform vectorized calculations efficiently

- Statistical Operations 

Statistical operations help uncover insights from data, such as averages, trends, and variation—critical for effective analysis.

What is it?
These are mathematical computations performed on datasets to understand distribution, variability, and relationships.

Where is it used?
In all phases of analytics—exploratory data analysis, trend identification, anomaly detection, and feature engineering.

    - NumPy Functions


How is it used?

import numpy as np

data = np.array([10, 20, 30, 40, 50])
print("Mean:", np.mean(data))
print("Standard Deviation:", np.std(data))
print("Median:", np.median(data))


Output:

Mean: 30.0
Standard Deviation: 14.142135623730951
Median: 30.0




    - Matrix Operations


How is it used?

import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("Dot Product:\n", np.dot(A, B))


Output:

Dot Product:
 [[19 22]
  [43 50]]



--- Takeaways / best practices
• Use NumPy for fast and scalable numerical analysis
• Master functions like mean, median, std, dot for common tasks
• Ensure shape compatibility when doing matrix operations


Topic 4: Pandas DataFrames Basics


-   Pandas DataFrames

DataFrames are the core data structure in Pandas, used to handle tabular data — like an Excel sheet in Python.

What is it?
A 2D, labeled data structure with rows and columns, great for analyzing and transforming data.

Where is it used?
Used extensively in data cleaning, exploration, feature engineering, and transformation.

How is it used?
• Create with pd.DataFrame() from a dictionary, list of lists, or CSV
• Use .head() [View initial rows] and .info() to inspect
• Modify, filter, or aggregate using DataFrame methods




-   Pandas Series

A Series is a one-dimensional array with labels (like a single column from a DataFrame).

What is it?
A labeled, one-dimensional array — similar to a column in Excel.
Where is it used?
For storing and analyzing individual columns or lists of values.

How is it used?
• Create with pd.Series()
• Access values using index or slicing
• Perform element-wise operations



Code:

import pandas as pd
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s)

Output:

a 10
b 20
c 30




-   Reading CSV and Excel Data

To analyze real-world data, you need to load it from external files.

What is it?
Pandas allows you to read .csv or .xlsx files into a DataFrame.

Syntax: pd.read_csv(filepath)

Loads data from a CSV file into a Pandas DataFrame

Supports numerous parameters for customization



Where is it used?
In almost every data project — CSV files are the most common data source.

How is it used?
• Use pd.read_csv('filename.csv') for CSV
• Use pd.read_excel('file.xlsx') for Excel
• Optional arguments: delimiter, header, sheet name, etc.

Code:

df = pd.read_csv("sales_data.csv")
print(df.head())

Output:
(Depends on file content, shows first 5 rows of the file)



-   Creating Columns

Add new columns to enhance or transform your dataset.

What is it?
You can create new columns using existing ones or assign fixed values.

Where is it used?
To add calculated fields or prepare data for modeling.

How is it used?
• Assign a new column directly using df['new_column'] = ...
• Use math or string operations on existing columns

Code:

df['Tax'] = df['Revenue'] * 0.18

Output:
Adds a new column "Tax" calculated from Revenue



-   Data Preview

You always need to check your data before analysis.

What is it?
Functions like .head(), .tail(), .sample() give quick snapshots.

Where is it used?
During initial data loading and before applying transformations.

How is it used?
• .head(n) → first n rows
• .tail(n) → last n rows
• .sample(n) → random sample of n rows

Code:

print(df.head(3))

Output:
First 3 rows of the DataFrame



-   Handling Unique Separators in Data

Some CSV files use ; or | instead of commas — Pandas lets you handle that.

What is it?
You can define custom separators using the sep argument in read_csv.

Where is it used?
While importing data files that don’t follow the standard CSV format.

How is it used?
• Use pd.read_csv('file.txt', sep='|')
• Works for .csv, .txt, .dat files

Code:

df = pd.read_csv("data.txt", sep='|')

Output:
Loads the data using pipe | as column separator


-   Parsing Dates Data

Dates often come as strings; parsing them into date objects is crucial for time-based analysis.

What is it?
Converting string-formatted dates into datetime objects using Pandas.

Where is it used?
In time series analysis, trend plotting, and date filtering.

How is it used?
• Use parse_dates=['column_name'] while reading
• Use pd.to_datetime() to convert later

Code:

df = pd.read_csv("data.csv", parse_dates=['OrderDate'])

Output:
"OrderDate" column is automatically converted to datetime format


-   Understanding Shape and Datatypes of DataFrames

Know your data’s structure and types before diving into analysis.

What is it?
.shape gives (rows, columns) and .dtypes shows data types per column.

Where is it used?
Early in data analysis to decide what cleaning and transformations are needed.

How is it used?
• df.shape → Get dimensions
• df.dtypes → Understand each column’s type
• df.info() → Summary including nulls and types




Topic 5: Pandas Selection 


-   Exploring Data using info() and describe()

Before any analysis, it's critical to understand the structure and summary statistics of your dataset.

What is it?
info() gives a concise summary of the DataFrame; describe() provides descriptive statistics for numerical columns.

Where is it used?
During the initial exploration phase to assess data types, null values, and basic distribution.

How is it used?
• df.info() → View column names, non-null counts, and data types
• df.describe() → Get count, mean, std, min, max, and percentiles





-   Selecting Data using iloc and loc

Selecting specific rows and columns is essential to focus your analysis.

What is it?
iloc accesses data by index positions, while loc accesses data by labels or conditions.

Where is it used?
In data cleaning, filtering, sampling, or when isolating specific observations.

How is it used?

.iloc[] allows selection using numerical indices (zero-based indexing).

df.iloc[row_index, column_index] → Retrieves specific values.

df.iloc[start:end] → Retrieves multiple rows.

df.iloc[:, column_index] → Selects entire columns by position. 






-   Data Transformations using apply()

Transformations help you modify or derive new values from existing columns.

What is it?
apply() lets you apply a function to rows or columns of a DataFrame or Series.

Where is it used?
For row-wise or column-wise transformations, feature engineering, and formatting.

How is it used?

Syntax:

df[<column_name>].apply(function_with_custom_transformation_logic)


Example:

df["Purchase Category"] = df["Purchase Amount"].apply(categorize_purchase)



-   Dropping Unnecessary Columns

Clean data is critical. Often, we need to remove irrelevant columns to streamline our dataset.

What is it?
Dropping columns removes unnecessary information that may clutter analysis or slow down processing.

Where is it used?
In data cleaning and optimization phases.

How is it used?

Syntax:

df.drop(columns=<list of column names to be removed>)


Example:

df.drop(columns=["Customer Address"], inplace=True)




-   Sorting DataFrames

Sorting helps you organize data to see trends, ranks, or find top/bottom performers.

What is it?
Use sort_values() to sort rows based on one or more columns.

Where is it used?
In reporting, ranking, or prioritization tasks.

How is it used?

  • Sort by a single column → df.sort_values(<column_name>)

  • Sort by multiple columns → df.sort_values([<List of column names>])

  • Choose ascending or descending order using ascending=<True/False>


Returns the sorted DataFrame.


Example:

df_sorted_by_amount = df.sort_values("Purchase Amount", ascending=False)





Topic 6: Missing Data & Text Ops


-   Detect Missing Values with isnull()

Missing values are common in real-world data and need to be identified clearly before analysis or modeling.

What is it?
isnull() identifies NaN (Not a Number) or null entries in a DataFrame or Series.

Where is it used?
During data cleaning, especially before imputing or removing invalid data.

How is it used?
• df.isnull() → Returns a DataFrame with True/False
• df.isnull().sum() → Counts missing values per column
• Use any() to check if any column has nulls


missing_values = df.isnull()


missing_counts = df.isnull().sum()


Code:

import pandas as pd


df = pd.DataFrame({'Name': ['Alice', None, 'Charlie'], 'Age': [25, 30, None]})


print(df.isnull())

Output:

 Name     Age
0 False      False
1 True     False
2 False      True



-   Handling Missing Values with dropna() and fillna()

Once missing values are found, the next step is deciding whether to remove or fill them.

What is it?
dropna() removes rows with missing data, while fillna() replaces them with a specific value or method.

Where is it used?
In data cleaning and preprocessing pipelines to prepare data for analysis or ML models.

How is it used?
• df.dropna() → Drops rows with any null
• df.fillna(0) → Replaces nulls with 0
• df.fillna(df.mean()) → Replaces with column mean (numeric only)


Syntax Variants:

  • df.dropna() → Drops all rows with any missing values.

  • df.dropna(how='all') → Drops only rows where all values are missing.

  • df.dropna(subset=['Column1', 'Column2']) → Drops rows where specified columns have missing values.




-   Search for text in DataFrames using contains()

Text-based filtering is essential when working with categorical or free-text data.

What is it?
contains() checks whether a string exists in a column’s values using pattern matching.

Where is it used?
In filtering rows by text keywords, such as finding all entries with "error" or a particular city name.

How is it used?
• df['Column'].str.contains('text')
• Combine with .loc[] to filter rows
• Use na=False to avoid NaN-related errors

Code:


-   Replacing Values with replace()

Sometimes you need to update incorrect labels, fix typos, or standardize data.

What is it?
replace() allows value-level substitutions in Series or DataFrames.

Where is it used?
In data cleaning to fix inconsistent values, rename categories, or anonymize data.

How is it used?
• df['Column'].replace('Old', 'New')
• Replace multiple values: df.replace(['Yes', 'No'], [1, 0])
• Can be applied on entire DataFrame

Code:




Topic 7: Ranking & Grouping


Data Ranking and Filtering


  - Ranking Data
Ranking helps identify top or bottom performers within your dataset based on numerical values.

What is it?
The rank() function assigns a rank to each value, with the smallest value getting the lowest rank.

Where is it used?
In scenarios like top 10 revenue products, ranked scores, or sales leaderboards.

How is it used?
• df['Rank'] = df['Revenue'].rank(ascending=False)
• Use method='dense' to avoid gaps in rank numbers

Ways to Rank in Pandas:





Code:

import pandas as pd
df = pd.DataFrame({'Product': ['A', 'B', 'C'], 'Revenue': [500, 700, 700]})


df['Rank'] = df['Revenue'].rank(ascending=False, method='dense')


print(df)

Output:

Product Revenue         Rank
A     500     3.0
B     700     1.0
C     700     1.0


  - Filtering with Conditions


Filtering data relies on logical comparisons like >, <, ==, and combining them with & or |.

What is it?
Logical operators are used to create boolean masks for filtering rows.

Where is it used?
In conditional analysis, such as filtering high-value transactions or invalid entries.

How is it used?
• df[df['Revenue'] > 1000]
• Combine multiple: df[(df['Revenue'] > 1000) & (df['Region'] == 'East')]


Ways to Filter:


Code:

df = pd.DataFrame({'Revenue': [500, 1500, 2000], 'Region': ['East', 'West', 'East']})


filtered = df[(df['Revenue'] > 1000) & (df['Region'] == 'East')]


print(filtered)

Output:

Revenue Region
2000   East



Counting and Aggregating Data


  - Counting Unique Values


Counting distinct entries helps summarize categorical data effectively.

What is it?
nunique() and value_counts() count distinct values and their frequencies.


Where is it used?
For category distribution, label frequency, or data health checks.

How is it used?
• df['Category'].nunique() → Count unique
• df['Category'].value_counts() → Frequency of each

Code:

df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C', 'B']})


print(df['Category'].value_counts())

Output:

A 2
B 2
C 1  


  - Grouping Data


Grouping allows you to summarize and compare metrics across categories.

What is it?
groupby() is used to split data into groups and perform aggregations like sum(), mean().

Where is it used?
In summarizing trends per region, product, category, etc.

How is it used?
• df.groupby('Category')['Revenue'].sum()
• Combine with multiple aggregations using .agg()


– Handling Duplicates


Duplicate rows can mislead analysis and inflate counts or totals.

What is it?
drop_duplicates() removes duplicate rows based on selected columns.

Where is it used?
When cleaning raw data before grouping, joining, or aggregating.

How is it used?
• df.drop_duplicates()
• df.duplicated() → Flags duplicates with True/False

Code:


df = pd.DataFrame({'ID': [1, 2, 2, 3]})


df_clean = df.drop_duplicates()


print(df_clean)


Output:

ID
1
2
3






Topic 8: OOP in Python – Basics


Object-Oriented Programming (OOP) in Python - I

Object-Oriented Programming (OOP) is a programming approach that helps organize code logically and reuse components efficiently—which becomes especially useful as your analytics projects grow.

What is it?
A paradigm that uses classes and objects to structure programs by grouping related data and functions.

Where is it used?
In large-scale data analytics tools, automation scripts, dashboard frameworks, simulation models, and more—whenever code organization, scalability, and reusability matter.

How is it used?
• Define classes to model real-world entities like Dataset, Chart, or UserInput
• Instantiate objects from these classes to work with data
• Encapsulate methods (functions) to define object-specific behavior
• Use inheritance to extend functionality from existing classes
• Leverage constructors like __init__ to initialize object attributes

Takeaways / best practices
• OOP is ideal for building modular, reusable, and maintainable code structures
• Helps model real-world systems intuitively
• Improves collaboration by making code more organized and scalable

Understanding the Need for OOP

As projects grow in size and complexity, we need better structure than just functions and variables.

What is it?
Object-Oriented Programming (OOP) is a paradigm that organizes code using objects and classes to model real-world entities.

Where is it used?
In building scalable, reusable, and maintainable code for data pipelines, dashboards, and modular analysis tools.

How is it used?
• Define templates (classes) that represent entities (e.g., DataPoint, User, Report)
• Create objects (instances) to store and manipulate data
• Attach functions (methods) to these objects

Takeaways / best practices
• Use OOP to organize related data and behavior
• Promotes code reusability through inheritance
• Useful for large, collaborative data projects

Basics of OOP


  - Classes and Objects’

Classes and objects are the foundation of OOP.

What is it?
A class defines a blueprint; an object is an instance of that class.

Where is it used?
In structuring complex data logic such as defining Customer, Product, or AnalysisModel.

How is it used?
• Use class keyword to define
• Use object = ClassName() to create an instance
• Access attributes and methods with dot notation





Example:


  - self,__init__ Constructor Keyword


These are the tools that bring life to objects and enable inheritance.

What is it?
__init__ is the constructor that initializes object properties. self refers to the instance. super() lets child classes inherit parent behavior.

Where is it used?
In setting up object properties and extending parent classes in larger analytics applications.

How is it used?
• Define __init__() for initialization
• Always use self to refer to instance variables

self:



__init__:




Handling Functions Within Classes

Functions inside classes are called methods, and they define the behaviors of the object.

What is it?
Class functions that operate on data within an object and often reference self.

Where is it used?
In encapsulating logic specific to the object, like .calculate_profit() or .clean_data().

How is it used?
• Define functions inside class using def
• Use self to access instance variables
• Call methods using object.method()




Solution:




Topic 9: OOP in Python – Advanced


Object-Oriented Programming (OOP) in Python - II

This session focuses on taking your object-oriented design to the next level making your classes more powerful and flexible using special methods and applying the core principles of OOP.

What is it?
OOP-II goes beyond the basics of classes and objects and introduces customization using dunder (double underscore) methods and the four key principles that guide modular programming.

Where is it used?
In scalable analytics frameworks, model pipelines, simulations, and custom data types or visualizations.

How is it used?
• Use __str__() and __len__() to customize object behavior
• Apply encapsulation to protect data
• Use inheritance to reuse code
• Enable polymorphism for dynamic behavior
• Leverage abstraction to simplify complexity


Special Methods
- Understanding __main__

What is it?
__main__ is the name given to the top-level script environment. Code inside if __name__ == "__main__" runs only when the script is run directly.

Where is it used?
In structuring Python scripts—especially in modular code, utilities, and testing.

How is it used?
• Prevents execution of helper code when imported
• Useful for separating reusable code and test blocks

Code:

def greet():
  print("Welcome to OOP")

if name == "main":
  greet()

Output:
Welcome to OOP


- dunder methods



What is it?
"Dunder" methods (like __str__, __len__, __eq__) are special functions with double underscores used to customize class behavior.

Where is it used?
In defining how objects behave with built-in functions (print(), len(), etc.)

How is it used?

Dunder methods (also called magic methods or special methods) are predefined methods in Python that start and end with double underscores (__method__).

They allow you to:

  • Define how objects are displayed (__str__ for user-friendly print of object, __repr__ for developer-friendly print of object)

  • Customize built-in functions (__len__ for object length)

  • Enable operator overloading (__add__, __eq__)


Code:



The Four Pillars of OOP
- Encapsulation,


What is it?
The practice of hiding internal object details and exposing only what's necessary.

Where is it used?
In protecting data integrity and restricting direct access to class attributes.

How is it used?
• Use _var for protected and __var for private variables
• Access data via getter/setter methods




Code:

class BankAccount:

    def __init__(self, account_holder, account_type, initial_balance):

        self.account_holder = account_holder  # Public attribute

        self._account_type = account_type      # Protected attribute

        self.__balance = initial_balance           # Private attribute

    def get_balance(self):

        return self.__balance

    def set_balance(self, newBalance):

        self.__balance = newBalance

        return self.__balance


- Inheritance,

What is it?
One class (child) inherits attributes and methods from another (parent).

Where is it used?
In reusing code, like when LineChart inherits from a generic Chart class.

How is it used?
• Define parent class
• Use child class with super() for extending behavior


Instead of writing separate code for each, we create a base Employee class and let others (Manager, Engineer, Intern, etc.) inherit from it.


Inheritance is an OOP principle where a child class (sub-class) derives attributes and methods from a parent class (super-class). This avoids redundancy and promotes reusability.

Python supports:

  • Single Inheritance: One class inherits from another.

  • Multiple Inheritance: A class inherits from multiple classes.

  • Multilevel Inheritance: A child class inherits from another child class.


super():


Code:



- Polymorphism,

Polymorphism is an OOP concept where different classes use the same method name but implement it differently.


Key types of polymorphism:

Method Overriding (Runtime Polymorphism): A subclass redefines a method from the parent class. This is relevant for our example.

Method Overloading (Compile-time Polymorphism): A class has multiple methods with the same name but different parameters (simulated in Python using default arguments). 


Example:

Think of a UPI app (like Google Pay or PhonePe):

The user enters the UPI ID and amount, then clicks "Pay".

The system does not check which bank the user is using—each bank knows how to process the payment.

This is polymorphism—different banks implement pay() differently, but the app treats all banks the same.



- Abstraction

Different smartphone brands may have different ways of clicking and processing images, but as a user, you only care about pressing the "Click" button to take a photo. 

You don’t need to know the technicalities of how the camera works inside each phone model.


Abstraction hides those technical details and exposes only the essential functionality.


Where is it used?
In user-defined libraries, dashboards, or ML pipelines to simplify usage.

How is it used?


In Python, we use:

  • Abstract classes

  • Abstract methods 

to achieve abstraction. 

An abstract class cannot be instantiated directly, and it forces the subclass to implement abstract methods.



Code:





Topic 10: Interview Readiness


- Most commonly asked coding questions

  1. Reverse a String 

  2. Check if a string is palindrome

  3. Generate Even Fibonacci Series 

  4. Anagram Check

  5. Find Factorial

- NumPy and Pandas preparation

  1. NumPy - Matrix Operations and Broadcasting - Link

  2. Pandas -  Data Import and Cleaning - Link

  3. Pandas - Data Exploration and Selection - Link

  4. Pandas - Handling Missing Values and Replacing Data- Link

  5. Pandas - Grouping, Aggregation, and Filtering - Link

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article