Big Data Developer Resume

SUMMARY

Having 6 years of professional experience this includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using Java/J2EE Technologies and Hadoop technologies.
Experience in working with Amazon EMR, Amazon Glue, Databricks and Cloudera (CDH5 ) Hadoop Distributions.
Experience in Hadoop Ecosystem tools which including HDFS, Yarn, Hive, Sqoop, Spark, Zookeeper and Oozie.
Good knowledge in EMR (Elastic Map Reducing) to perform big data operations in AWS.
Knowledge in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
Good Knowledge on understanding of Spark and its benefits in Big Data Analytics.
Implemented advanced procedures like text analytics and processing using the in - memory computing capabilities like Apache Spark written in Python.
Experience in creating Databricks Clusters to run multiple data loads parallel in PySpark.
Experience on importing and exporting data using Kafka.
Experience in loading the data from the different data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables.
Experience in migrating data by using SQOOP from HDFS to Relational Database System and vice-versa according to client's requirements.
Good understanding and working experience on Cloud based architectures.
Good Experience on source control repositories like GIT and SVN .
Experience in working different scripting technologies like Python, UNIX shell scripts.
Experience in developing web page interfaces using HTML, CSS and Type Scripting languages.
Experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
Good understanding and experience with Software Development methodologies like Agile and Waterfall and performed Testing such as Unit, Regression, White-box, Black-box.
Ability to work onsite and offshore team members.

TECHNICAL SKILLS

Big Data Technologies: HDFS, YARN, Hive, Sqoop, HBase, Spark, Ambari, Hue, Impala, Oozie and Zookeeper

Hadoop Distributions: Cloudera CDH4 & CDH5

Database: Oracle 10g, MySQL, DB2, SAP

Programming Languages: Java, SQL, Python and Scala

Operating System: Windows, Linux, UNIX, Mac OS

Cloud Platforms: AWS Cloud, Databricks

IDE Tools: Eclipse, WebStorm, Visual Studio Code

Built Tools: Maven, GitHub, JUNIT

Development Methodologies: Visual paradigm for UML, Agile/Scrum

PROFESSIONAL EXPERIENCE

Confidential

Big Data Developer

Responsibilities:

Hands on experience in working with Hadoop Cloudera Distribution platform, Databricks and AWS Glue .
Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured data.
Project lifecycle from analysis to production implementation, with emphasis on identifying data validation, developing logic and transformations as per requirements and creating notebooks to load the data into Delta-Lake .
Worked on transformation layer using APACHE SPARK RDD, Data frame APIs and SPARK SQL and applied various transformation and aggregations provided by Spark framework.
Worked on SPARK integration with Hive and DB2 at ingestion layers. Worked with different file formats like Parquet, and JSON etc.
Created an Automated Databricks workflow notebook to run multiple data loads (Databricks notebooks) in parallel using Python.
Created Databricks Delta Lake process for real-time data load from various sources (Databases, Adobe and SAP) to AWS S3 data-lake using Python/PySpark code.
Creating Hive tables as per requirement were Internal (or) External tables are defined with appropriate static/dynamic partitions and bucketing intended for efficiency.
Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
Worked with AWS S3 data and AWS Glue jobs to transform data to a format that optimizes query performance for Athena .
Using SQOOP framework we have been loading batch process data from different data sources into Hadoop.
Executing Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
Involve in Performance tuning of SPARK applications and HIVE queries.
Experience in creating BDR jobs for data replication.
Using JIRA for issues and project tracking and TFS for version controlling and Control-M for scheduling the jobs.
Designed and developed the application using Agile methodology and followed SCRUM.

Environment: AWS Services (GLUE, Athena, S3, RedShift), Cloudera (CDH5), HDFS, Sqoop, Hive, Spark, Databricks, PySpark, Python, IBM DB2, Control M, Informatica, TFS, DB Visualizer and Eclipse.

Confidential, Charlotte, NC

Big Data/Cloud Developer

Responsibilities:

Created EMR Clusters for Data ingestion and also Query clusters for analytics purpose.
We used to perform Spark transformation logic to extract client Emails and Call data related to retail clients for inbound phone reporting.
Using Spark, we need to write transformation logics to extract Retail divisional unit related KPI’s for a particular department.
Built data analytics on Spark which increased the revenue of the business.
Experience on creating AWS Service Catalog create and manage catalogs of IT services.
We used to import the data from different data sources like Oracle and IBM DB2 databases into Amazon S3 in different file formats like Parquet, and CSV using Sqoop.
Using AWS Secret Manager to protect secrets that needs to access applications, service enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle
Creating Hive tables as per requirements where Internal and External tables are defined with appropriate Static/Dynamic partitions intended for efficiency.
Experience in AWS Route53 for effectively connecting user requests to infrastructure running in AWS EC2 and AWS S3 Buckets.
We are using Amazon Cloud Watch to monitor and track resources on AWS.
Experience in creating CloudFormationTemplate (CFT) to create bucket, roles, parameters and etc., and using Amazon CloudFormation we monitor the Stack creation.
Created Email alerts for any failures using Splunk.
Experience in debugging the error logs.
Experience in Jupyter Notebook for Spark SQL and scheduling the cronjobs using Spark Submit.
Scheduled the daily jobs using Oozie workflows and Facilitated Crontabs for Data Analysts to schedule their jobs.
Created Web application to access tableau reports.
Created a web Application using Angular 7.
Used NPM (Node Package Manager) for Building packages.
Used HTML5 and Bootstrap to design the web pages.
Used Jasmine/Karma for Unit Test cases.
Ability to fix Production Data Loading Issues which will arise at the time of Production Support.
Using JIRA for bug tracking, Bitbucket for version control and Control-M for scheduling the jobs.
Worked with Agile SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: AWS Tools (S3, EMR, EC2, Cloud Watch, Cloud Formation, IAM), Hadoop, Sqoop, Hive, Presto, Spring Tool Suit (STS), Oracle, DB2, Spark, Python, Jupyter Notebook

Confidential

Hadoop/Spark Developer

Responsibilities:

Experience with Cloudera Manager for management of Hadoop cluster.
Experience in importing the tables from Teradata into Hive using Sqoop jobs .
Experienced in working with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
Experience in creating batch and real-time pipelines using Spark as the main processing framework.
Worked on the large-scale Hadoop Yarn cluster for distributed data processing and analysis using Spark .
Analyze large datasets to find patterns and insights within structured and unstructured data to help business with the help of Tableau.
Using Spark, we need to write transformation logics to extract healthcare related KPI’s for a particular department. Example: Radiology.
Experience in loading D-Stream data into Spark RDD and did in-memory data computation to generate output response.
The results from the Hive warehouse layer we are publishing into IBM DB2. Where the Tableau picks it up.
Worked with Oozie workflow engine to run multiple Hive jobs.
Experience in using JIRA for bug tracking, BitBucket for version control and Control-M for scheduling the jobs.
Worked with Agile SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Cloudera (CDH5), HDFS, Sqoop, Hive, Oozie, Spark, Scala, Java, Teradata, Maven, IBM DB2, Control M, Bamboo, Bitbucket and Eclipse.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship