We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Cary, NC

SUMMARY

  • Almost 7 years of IT experience in analysis, design, development and implementation of large - scale applications using Big Data and Java/J2EE technologies such as Apache Spark, Hadoop, Hive, Pig, Sqoop, Oozie, HBase, Zookeeper, Python & Scala.
  • Experience in the Analysis, Design, Development and Deployment of the Big Data applications using Hadoop and Apache Spark Frameworks.
  • Experience in the Installation, Configuration and Management of Hadoop eco systems in the various big data distributions like Cloudera, Horton Works.
  • Experience in Data Architecture, Design, Pipelining, Configuration and Management using Hadoop and Apache Spark ecosystems on different distributions.
  • Hands on experience in Hadoop components like MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Oozie, Apache Spark, Impala .
  • Integrated various frameworks to build Data pipeline for different types of databases like MySQL, Oracle in RDMS and MongoDB, NoSQL in Non- Relational Databases.
  • Capable of configuring and maintaining Amazon EMR manually and as well as through Cloud Formation scripts in Amazon AWS.
  • Migrated Hadoop MapReduce programs onto Apache Spark to achieve increased processing efficiency on Scala .
  • Expertise knowledge of using Apache Kafka and Spark Streaming to stream live data to feed into processing.
  • Experience in the installation, configuration, supporting and managing Hadoop clusters in Cloudera and Hortonworks distribution(s).
  • Capable of importing and processing data from multiple database systems like MySQL, Oracle into the Hadoop environment.
  • Extensive knowledge in programming languages such as Scala, Python, Java, C++ .
  • In-Depth knowledge of Statistics and Machine Learning Algorithms such as Classification and Regression models .
  • Experience with various databases like Oracle, SQL Server, Teradata and DB2
  • Strong experience in Data Analysis, Data Profiling, Data Cleansing & Quality, Data Migration, Data Integration .
  • Expertise in creating complex SQL databases with efficient schemas producing and storing quality data.
  • Good Knowledge of producing complex and intuitive dashboards in Tableau, reporting data on various metrics by using various functionalities.
  • Working knowledge in the data prediction and analysis using tools like SPSS, Minitab, Tableau .
  • Flexible with Unix, Linux and Windows Environments working with Operating Systems like Cent OS 5/6, Red Hat 6/7, Ubuntu 13/14.

TECHNICAL SKILLS

Big Data Frameworks: HDFS, MapReduce, Apache Spark, Apache Hive, YARN, HBase, Pig, Impala, Apache Spark, Apache Pig, Apache HBase, Spark Streaming, Spark SQL, Spark ML, Oozie, Hue, MongoDB, Sqoop, Zookeeper, Storm, Flume, Kafka, MongoDB, Cassandra

Languages & Scripting: Scala, J SQL, HiveQL, ava, Shell Script, JavaScript, Python.

Hadoop Distribution: Cloudera, Hortonworks, AWS ECR

Cloud Technologies: Amazon EC2, S3, EMR, Dynamo DB, Lambda, Kinesis, ELB, RDS, Glue, SNS, SQS, EBS, CloudFormation

Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans, Maven, Junit, MR Unit, Scala Unit

Database: MySQL, Sybase, MS-SQL server, Postgres, DB2, Oracle 11g/10g/9, MongoDB, NoSQL

Application Servers: WebLogic, JBoss, Apache Tomcat 8.0, IBM WebSphere

Operating Systems: Unix, Linux, Windows, Mac OS

Version Control: GIT, SVN, Bitbucket

BI Tools: Tableau, Qlik

PROFESSIONAL EXPERIENCE

Confidential, Cary, NC

Sr. Hadoop Developer

Responsibilities:

  • Migrated data from Hadoop system to process and store in Spark applications running on Amazon EMR.
  • Wrote ETL scripts to move data from HDFS to S3 and vice versa and created Hive external tables on top of this data to be utilized in Big data applications.
  • Created scripts to sync data between local MongoDB and Postgres databases with those on AWS Cloud.
  • Installed application on AWS EC2 instances, configured the storage on S3 buckets and worked closely with AWS EC2 infrastructure teams to troubleshoot complex issues.
  • Developed Spark code and Spark-SQL / Streaming for faster testing and processing of data
  • Developed Spark scripts for Data processing to improve the efficiency of the Big Data system
  • Processed the source data to structured data and store in NoSQL database Cassandra.
  • Wrote Hive code to make the data more scalable for downstream models.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
  • Developed Oozie workflows to stream the data into Spark system using Hive SQL queries.
  • Used Spark - SQL to Load Parquet data and created Datasets defined by Case classes and handled Structured data using Spark SQL which were finally stored into Hive tables for downstream consumption.
  • Performed Data ETL using Scala Scripts on Spark system while migrating large sets of Json data to Parquet format.
  • Developed Data Pipelines using Amazon Kinesis and Apache Kafka to stream and process real time data.
  • Effectively used Sqoop to transfer data between databases and HDFS.
  • Used HiveQL to perform the data analysis and find the relevant data to meet the business requirements.
  • Worked on AWS Glue for scheduling jobs and Automated Glue with CloudWatch events.
  • Involved in developing Hive, MongoDB DDL templates which were hooked into Oozie workflows to create, alter and drop tables.
  • Performed the writing and execution of test cases using the MR Unit and Scala Unit in the Hadoop and Apache Spark
  • Developed MapReduce programs as a part of predictive analytical model development.
  • Worked on Sequence, ORC, Avro, Parquet file formats and used compression techniques like LZO, Snappy.
  • Configured GitHub plugin to offer integration between GitHub & Jenkins and regularly involved in version control and source code management including Release Build and Snapshot Build management

Environment: Cloudera, Apache Spark, MapReduce, HDFS, Pig Scripts, Hive Scripts, HBase, Sqoop, Amazon EMR, Amazon S3, Amazon Glue, CloudWatch, Amazon Kinesis, Zookeeper, Oozie, Oracle, Shell Scripting.

Confidential, Raleigh, NC

Sr. Big Data Developer

Responsibilities:

  • Expertly handled the stream processing and storage of data to feed into the HDFS systems using Apache Spark, Sqoop.
  • Deployed the Scala code for stream processing using Apache Kafka in Amazon S3.
  • Performed various import functions using Sqoop on the data from MySQL to HDFS.
  • Worked extensively on Amazon CLI to transfer data to and from Amazon S3.
  • Used Amazon CloudWatch to monitor the system.
  • Achieved Performance tuning using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
  • Deployed MapReduce and Spark jobs on Amazon Elastic MapReduce using datasets stored on S3.
  • Importing and exporting data into HDFS and Hive using Sqoop and Flume.
  • Developed Pig scripts for analysing large data sets in the HDFS.
  • Accountable for creating users and groups through LDAP and give required permissions to respective users.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Monitored and controlled Local disk storage and Log files using Amazon CloudWatch.
  • Built scalable distributed data solution using Hadoop
  • Used Hive to perform extensive data validation and analysis.
  • Worked on performance enhancement and storage management using various file formats like Sequence files, RC files and implemented bucketing and partitioning of the data.
  • Involved in the Load transformations for Teradata databases into HDFS using Sqoop.
  • Involved in monitoring the MapReduce jobs using Job Tracker.
  • Created Sqoop Jobs, Pig and Hive Scripts to perform data ingestion from relational databases and compared with the historical data.
  • Involved in scheduling Hadoop jobs using Oozie workflow to organise events for High Data Availability.
  • Successfully performed business logic by writing Pig UDF(s) and used available UDF’s Piggy bank.
  • Created Oozie workflow environment to import real-time data using Kafka into the Hadoop system
  • Responsible for developing data pipeline using Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.

Environment: Amazon EMR, Amazon S3, Amazon CLI, HDFS, Hive, Spark, Spark-SQL, YARN, Flume, Sqoop, Pig, Kafka, Oozie, MySQL, Hive, MapReduce.

Confidential

Big Data Developer

Responsibilities:

  • Created data pipelines in multiple instances to load the data from DynamoDB to store in HDFS location.
  • Successfully executed Performance tuning of MapReduce jobs by analysing and reviewing Hadoop log files.
  • Involved in the Partitioning and Bucketing of the data stored in Hive Metadata.
  • Used Apache Flume to collect and aggregate large amounts of log data and staging data in HDFS.
  • Collected Log data from web servers to integrate into HDFS location.
  • Wrote the MapReduce programs to handle semi-structured and unstructured data like Json, Argo data files and Sequence files for log data.
  • Developed Kafka producer and consumers for message handling
  • Worked on extending the core functionalities of Hive and Pig by writing UDF’s using Java.
  • Involved in importing data from MS SQL server, MySQL into Hadoop using Sqoop
  • Identified and created Sqoop scripts to batch data periodically in to the HDFS.
  • Developed Oozie workflows to collect and manage for end to end processing
  • Analysed large and critical datasets using HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper.
  • Migrated HiveQL queries into SparkSQL to improve performance

Environment: Cloudera, MapReduce, Hadoop, HDFS, Pig Scripts, Hive Scripts, HBase, Sqoop, Zookeeper, Oozie, Oracle, Shell Scripting.

Confidential

Java Developer

Responsibilities:

  • Involved in the development of use case documentation, requirement analysis, and project documentation
  • Developed and maintained Web applications as defined by the Project Lead
  • Developed GUI using JSP, JavaScript , and CSS
  • Used MS Visio for creating business process diagrams
  • Developed Action Servlet, Action Form, Java Bean classes for implementing business logic for the struts Framework
  • Developed Servlets and JSP based on MVC pattern using struts Action framework
  • Developed all the tiers of the J2EE application. Developed data objects to communicate with the database using JDBC in the database tier, implemented business logic using EJBs in the middle tier, developed Java Beans and helper classes to communicate with the presentation tier which consists of JSPs and Servlets
  • Used AJAX for Client-side validations
  • Applied annotations for dependency injection and transforming POJO/POJI to EJBs
  • Developed persistence layer modules using EJB Java Persistence API (JPA ) annotations and Entity manager
  • Involved in creating EJBs that handle business logic and persistence of data
  • Developed Action and Form Bean classes to retrieve data and process server-side validations
  • Designed various tables required for the project in Oracle database and used Stored Procedures in the application. Used PL SQL to create, update and manipulate tables
  • Used IntelliJ as IDE and Tortoise SVN for version control
  • Involved in impact analysis of Change requests and Bug fixes

Environment: Java 5, Struts, PL/SQL, Oracle, EJB, IntelliJ, Tortoise SVN, MS Visio, Firebug, Apache Tomcat, JSP, Java Script, CSS.

We'd love your feedback!