
Chapter 6 : Number Columns¶
Chapter Learning Objectives¶
Various data operations on columns containing numbers.
Chapter Outline¶
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
from IPython.display import display_html
import pandas as pd
import numpy as np
def display_side_by_side(*args):
html_str=''
for df in args:
html_str+=df.to_html(index=False)
html_str+= "\xa0\xa0\xa0"*10
display_html(html_str.replace('table','table style="display:inline"'),raw=True)
space = "\xa0" * 10
1a. How to calculate the minimum value in a column?¶

Lets first understand the syntax
Syntax
pyspark.sql.functions.min(col)
Aggregate function: returns the minimum value of the expression in a group.
Parameters:
col : column
‘’’
Input: Spark data frame with a column having a string
Output : Spark data frame with a column with a split string
Summary:
1b. How to calculate the maximum value in a column?¶

Lets first understand the syntax
Syntax
pyspark.sql.functions.max(col)
Aggregate function: returns the maximum value of the expression in a group
Parameters:
col : column ‘’’
Input: Spark data frame with a column having a string
Output : Spark data frame with a column with a split string
Summary:
1c. How to calculate the sum of a column?¶

Lets first understand the syntax
Converts a string expression to upper case.
Syntax
pyspark.sql.functions.sum(col)
Aggregate function: returns the sum of all values in the expression.
Parameters:
col : column ‘’’
Input: Spark data frame with a column having a lowercase string
Output : Spark data frame with a column having a uppercase string
Summary:
1d. How to round values in a column?¶

Lets first understand the syntax
Syntax
pyspark.sql.functions.round(col, scale=0)
Round the given value to scale decimal places using HALF_UP rounding mode if scale >= 0 or at integral part when scale < 0.
Parameters:
col : column ‘’’
Input: Spark data frame with a column having a string
Output : Spark data frame with a column with a sliced string
Summary:
1e. How to calculate the mean of a column?¶

Lets first understand the syntax
Syntax
pyspark.sql.functions.mean(col)
Aggregate function: returns the average of the values in a group
Parameters:
col : column ‘’’
Input: Spark data frame with a column having a string
Output : Spark data frame with a column with a split string
Summary:
1f. How to calculate the standard deviation of a column?¶

Lets first understand the syntax
Syntax
pyspark.sql.functions.stddev(col)
Aggregate function: alias for stddev_samp.
‘’’
Input: Spark data frame with a column having a string
Output : Spark data frame with a column with a regex
Summary: