Introducing
Enterprise Big Data Engineering Program
Machine Learning using Spark
– Douglas Merrill
One of the biggest take-aways from 2020 that our society is presented with is the fact that the world around us is changing. Yet, what remains constant is the amount of data it is generating. Big Data has been referred to as the oil of the IT industry and rightly so, because it is fueling key business decisions.
of data to be produced by 2024
in business agility recorded by moving to a cloud
Chinese big data industry will reach the ¥150 billion ($22 billion) mark
– As predicted by Qianzhan Industry Research Institute
With organizations moving from traditional architectures to modern data architectures, data engineers have become very critical resources to build data pipelines with new relevant technologies that can scale and run on the cloud.
In today’s dynamic and competitive market, every organization looks for deeper analytics and insights to take up any enterprise level transformations. Employee skill development ensures that the workforce is ready to facilitate this transformation.
According to LinkedIn’s 2020 Emerging Jobs Report,
This program helps organisations deep skill their workforce in order to equip
them with disruptive solutions that enable them to work on Big Data using
modern Big Data architectures like Delta Architecture.
Organisations looking for employee training programs to deep skill their IT, data management and analytics professionals to develop and maintain structures that facilitate Big Data analytics.
Software and IT professionals working on data projects with at least 3 years of experience.
Ability to read, write, and understand English.
Spoken English is desired but not essential.
Application submission is followed by an interactive
video discussion with one of our mentors for guidance
regarding choosing the right specialization.
Easy access to the critically chosen practitioners cum mentors from the industry who carry years of experience in various technologies.
Learners are enabled in multiple ways to clarify doubts and resolve issues faced during the program.
Access to O’Reilly eBook, which is chosen to enhance the learner’s understanding.
Pre-configured local/cloud-based labs provided throughout the program to focus on hands-on learning and not on technical challenges.
Duration:
8-9 weeks
(Weekend based live sessions)
Program Overview :
The program seeks to establish strong foundations in key software engineering methodologies and imparts skills in building scalable enterprise data pipelines for analysis using Apache Spark. It will also empower learners with the skills to scale Data Science and Machine Learning tasks on Big Data sets using Apache Spark.
Key Differentiators:
• Our program decouples Apache Spark in a logically consistent manner.
• It covers three most popular ML algorithms (decision trees, clustering and regression) and is indispensable to those building ML-based analytical solutions.
• Scala Programming Language
• Spark Data frames & Data sets
• Resilient Distributed Data sets (RDD)
• Spark Streaming Featured
• Spark SQL
• Machine Learning
• Linear Regression and Decision Trees
• Clustering(K-means) and Logistic Regression
• Spark core using Scala
• Spark structured API – Data Frames, SQL using Python
• Spark structured API – Data engineering using Python
• Recall of Apache Spark
• Introduction to Machine Learning and Linear Regression
• Decision Trees and Random Forest Code
• Clustering (K-means)
• Logistic Regression