Setting up an RDD in PySpark

#ds410 #swe

Related: 9-8-2025 MapReduce Lazy Evaluation | Software engineering | Cloud computing

Practical Example: How to initialize Spark Context and create RDDs for big data processing.

import pyspark 
from Pyspark import SparkContext 

# Create a Spark Context variable
# "local" specifies that the code is running in local mode [5, 10]
# "WordCountApp" is the context name [5, 10]
sc = SparkContext("local", "WordCountApp")
print("SparkContext created successfully!")

RDD = sc.textFile("path_to_your_file.txt") # This creates an RDD object [2, 4]