We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • Around 8 years of professional IT experience in Analyzing, Designing, Development, Testing, Documentation, Deployment, Integration, and Maintenance of web based and Client/Server applications using Java and Big Data technologies (Hadoop and Spark).
  • 4+ years of Data analytics experience on designing and implementing complete end - to- end Big Data/Hadoop Infrastructure using Spark, Map Reduce, HDFS, PIG, HIVE, HBase, Sqoop, Flume and Oozie.
  • Experience in developing solutions to analyze large data sets efficiently.
  • Experience in writing complex Map Reduce jobs,Pig Scripts, Hive, Spark and data modeling.
  • Excellent understanding/knowledge of Hadoop Distributed File system (HDFS) architecture and design principles.
  • In depth understanding/knowledge of HadoopArchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts.
  • Experience in converting Map Reduce applications to Spark.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Good knowledge in using job scheduling and workflow designing tools like Oozie and Airflow.
  • Experience working with BI team and transform big data requirements into Hadoopcentric technologies.
  • Intensively worked on agile methodology and software lifecycle SDLC.
  • Experience in developing solutions to analyze large data sets efficiently.
  • Experience in performance tuning teh Hadoopcluster by gathering and analyzing teh existing infrastructure.
  • Experience in using apache NIFI to automate teh data movement between different Hadoop ecosystems.
  • Experience managing LDAP, active directory or Kerberos.
  • Developed multiple POCs using Spark-Shell and deployed on teh Yarn cluster, compared teh performance of Spark, with Hive and SQL/Teradata
  • Have good experience creating real time data streaming solutions using Apache Spark, Apache Storm, Kafka and Flume.
  • Developed multiple POCs using Pyspark and deployed on teh Yarn cluster and compared teh performance of spark with Hive and SQL/Teradata.
  • More TEMPthan one year of hands on experience using Spark framework with Python and Scala.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experienced in working with structured and unstructured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
  • Good knowledge on Amazon AWS concepts like EMR, S3,Lambda, EC2 and Cloud watch web services which provides fast and efficient processing of big data.
  • Experience in CI/CD pipeline management through Jenkins. Automation of manual tasks using Shell scripting.
  • Experience in performance tuning of Hive jobs by using Hive Partitioning, Bucketing and using Hive Serde’s.
  • Experience in storing teh data in Hadoop Cluster in Avro, Parquet and different file formats.
  • Experience in handling messaging services using Apache Kafka.
  • Experience in fine-tuning Map Reduce jobs for better scalability and performance.
  • Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
  • Experience in writing SQL, PL/SQL queries, Stored Procedures for accessing and managing databases such as Oracle, SQL, MySQL, and IBM DB2.
  • Good exposure to performance tuning hive queries and Map Reduce jobs in Spark framework.
  • Performed unit testing MRUnit and JUnit testing framework and Log4j to monitor teh error log.
  • Developed core modules in large cross platform applications using JAVA, J2EE with experience in java core concepts like OOPS, Multi-threading.
  • Designed Java components and integrated using Spring framework for Hibernate Object/Relational persistence mechanism.
  • Strong Organizational Skills with ability to work with individuals as well as teams from different backgrounds.

TECHNICAL SKILLS

Big data technologies: HDFS, Map Reduce, Pig, Zookeeper, NIFI, Sqoop, Flume, Impala, Oozie, Hive, Solr, kibana,Airflow, HBase, Storm, Spark.

Operating Systems: UNIX, LINUX, Windows

Databases: Oracle, MySQL, SQL, SQL Server.

Frameworks: Spring, log4j, JUnit, Struts, Hibernate

Tools: and Applications: Eclipse, Maven, Git, Putty

Programming skills: Java, C/C++, Scala, Python

Cloud Services: AWS (EMR, S3)

Web Development: HTML, CSS, JavaScript

No SQL: HBase, Cassandra, MongoDB

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, GA

Sr. Hadoop Developer

Responsibilities:

  • Worked on AWS cloud services like EMR, EC2,Lambda and S3.
  • Imported data from AWS S3 into Spark RDD, performed actions on RDD’s and stored teh data on AWS S3.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, flume, Kafka, Spark, Impala, Solr, Cassandra with Cloudera.
  • Exploring with teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Implemented Snappy Compression and Parquet file format for Staging, Computational Optimization.
  • Developed Sparkcode using Python and Spark-SQL/Spark Streaming for faster testing and processing of data.
  • Written shell script to run sqoop for bulk data ingestion from Rdbms into Hive.
  • Developed Kafka Producers and consumers, HBase clients, Sparkand Hadoop, Map Reduce jobs along with components on HDFS, Hive.
  • Developed Sqoop scripts to import and export data from RDBMS into HDFS, HIVE and handled incremental loading on teh customer and transaction information data dynamically.
  • Imported teh data from different sources like HDFS/HBASE into Spark RDD.
  • Involved in importing teh real-time data to Hadoop using Kafkaand implemented teh Oozie job for daily run.
  • Used apache NIFI to copy teh data from local file system to HDFS.
  • Performed job functions using Spark API's using Python and Scala for real time analysis and for fast querying purposes.
  • Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to Hive and Impala.
  • Worked with HBase in creating HBase tables to load teh data like semi structured data from different resources
  • Worked on JSON and XML data Parsing and storing teh data in Denormalized format.
  • Experience working with Dimension and Fact tables.
  • Experience in writing complex Hive scripts.
  • Experienced in using kibana to generate teh customer dashboards.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
  • Created pipeline for processing structured and unstructured streaming data using spark streaming and stored teh filtered data into S3 as parquet files.
  • Implemented new projects builds framework using Jenkins & maven as build framework tools.
  • Used Java collection Frameworkto store and process complex device metadata and other related information.
  • Data validation dashboard is built using Solr to display teh message board.
  • Experience in handling Incremental updates using Hive and store teh data in HDFS.
  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
  • Transferred data from AWS S3 to AWS redshift.
  • Experience in deploying Data pipelines through CI/CD Jenkins.
  • Experience in validating data and writing teh test cases for teh data pipelines and automating them.
  • Used Oozie workflow engine for managing interdependent Hadoop jobs and to automate several types of Hadoop jobs.

Environment: Hadoop, Map Reduce, Solr, Cloudera, Hive, Hbase, Spark, Spark SQL, Teradata, Flume, Kafka, Sqoop, Oozie, NIFI,kibana, Python, Java, AWS.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Analyzed data usingHadoopcomponents Hive and Pig.
  • Load and transform large sets of structured, semi structured and unstructured data usingHadoop/Big Data concepts.
  • Analyzed teh web log data using teh HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased on website.
  • Experienced in developing applications inHive, Sqoop, Oozie, Java Map Reduce, Spark SQL, HDFS, Pig and TEZ with Horton works.
  • Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports by our BI team.
  • Responsible to manage data coming from different sources.
  • Integrated Oozie with teh rest of theHadoopstack supporting several types ofHadoopjobs out of teh box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Used JSON serde’s for converting JSON data into Denormalized format.
  • Involved in loading data from UNIX file system to HDFS.
  • Responsible for creating Hive tables, loading data and writing hive queries.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Used impala to read and write in teh HDFS from Hbase.
  • Loaded teh data into spark RDD and do in memory data computation to generate teh Output response.
  • Used Ambari to monitor teh health of teh Hadoop cluster.
  • Designed and implemented Hive and Pig UDF’s using python for evaluation, filtering and storing of data.
  • Managed Application development using Agile life cycle development methodologies.
  • Extracted teh data from Teradata into HDFS using teh Sqoop.
  • Exported teh patterns analyzed back to Teradata using Sqoop.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Developed simple to complex Map reduce jobs using python dat are implemented using Hive and pig.
  • Used Pig Latin Scripts and UDF, UDAFS while analyzing teh unstructured and semi structured data.
  • Utilized in-memory processing capability of Apache Spark to process data using Spark SQL.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Python.
  • Used teh partition, dynamic partitioning and Bucketing techniques while creating teh hive tables for easy analysis of dynamic data which is coming from different sources.

Environment: HadoopCluster, HDFS, Hive, Pig, Sqoop, Linux,HadoopMap Reduce, HBase, Shell Scripting, Spark, Linux, Python, UNIX Shell Scripting and Big Data, Horton works (HDP), Ambari.

Confidential, Los Angeles, CA

Hadoop Developer

Responsibilities:

  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed data pipeline using Flume, Sqoop, Pig, Hive and Java Map Reduce to ingest behavioral data into HDFS for analysis.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Created customized BI tool for manager team dat perform Query analytics using Hive QL.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Estimated teh hardware requirements for Name Node and Data Nodes & planning teh cluster.
  • Created Hive Generic UDF's, UDAF's in python and Java to process business logic dat varies based on policy.
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Optimizing teh Hive queries using Partitioning and Bucketing techniques, for controlling teh data distribution.
  • Worked with HBase to create tables and store data.
  • Worked on integrating Hive and No sql HBase tables for handling teh Updates.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats
  • Discussed teh implementation level of concurring programming in spark using python with message passing.
  • Involved in Cassandra Data Modeling and Analysis and CQL (Cassandra Query Language).
  • Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
  • Used Oozie workflow engine to manage interdependent Hadoopjobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Created Data Pipeline of Map Reduce programs using Chained Mappers.
  • Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
  • Implemented map reduce programs to perform joins on teh Map side using Distributed Cache in Java.
  • Create a complete processing engine, based on Cloudera’s distribution, enhanced to performance.

Environment: HDFS, HBase, Map Reduce, Java, Hive, Pig, Sqoop, Flume, Kafka, Oozie, Hue, Storm, Zookeeper, AVRO Files, SQL, ETL, Cloudera Manager, MySQL, MongoDB.

Confidential, San Francisco, CA

Sr. Java / J2EE Developer

Responsibilities:

  • As a programmer, involved in designing and implementation of MVC pattern.
  • Extensively used XML where in process details are stored in teh database and used teh stored XML whenever needed.
  • Part of core team to develop process engine.
  • Developed Action Classes & Validation Struts framework.
  • Created project related documentations like user guides based on role.
  • Implemented modules like Client Management, Vendor Management.
  • Implemented Access Control Mechanism to provide various access levels to teh user.
  • Designed and developed teh application using J2EE, JSP, XML, Struts, Hibernate, Spring technologies.
  • Coded DAO and hibernate implementation Class for data access.
  • Coded Springs Services Class and Transfer Objects to pass teh data between layers.
  • Implemented Web Services using Axis
  • Used different features of Struts like MVC, Validation framework and tag library.
  • Created detail design document, Use cases, and Class Diagrams using UML
  • Written ANT scripts to build JAR, WAR and EAR files.
  • Developed Standalone Java Component dat will interact with Crystal Reports on Crystal Enterprise Server to view Reports as well Scheduling of Reports as well storing data as XML and sending data to consumers using SOAP.
  • Deployed teh application and tested on Websphere Application Servers.
  • Developed JavaScript for client side validations in JSP.
  • Developed JSPs with Struts taglibs for teh presentation layer.
  • Coordinated with teh onsite, offshore and QA team to facilitate teh quality delivery from offshore on schedule.

Environment: Java 1.5, Spring, Spring Web Service, JSP, JavaScript, Hibernate, SOAP, CSS, Struts, Web sphere, MQ Series, JUnit, Apache, Windows XP and Linux.

Confidential

Java Developer

Responsibilities:

  • Designed a system and developed a framework using J2EE technologies based on MVC architecture.
  • Involved in teh iterative/incremental development of project application.
  • Designed and Developed UI's using JSP by following MVC architecture.
  • Designed teh control which includes Class Diagrams and Sequence Diagrams using VISIO.
  • Generated XML pages with templates using XSL. Used JSP and Servlets, EJBs on server side.
  • Developed a complete External build process and maintained using ANT.
  • Implemented Home Interface, Remote Interface, and Bean Implementation class.
  • Implemented business logic at server side using Session Bean.
  • Extensive usage of XML - Application configuration, Navigation, Task based configuration.
  • Designed and developed Unit and integration test cases using JUnit.
  • Used EJB features effectively- Local interfaces to improve teh performance, Abstract persistence schema, CMRs.
  • Used Struts web application framework implementation to build teh presentation tier.
  • Wrote PL/SQL queries to access data from Oracle database.
  • Set up Websphere Application server and used ANT tool to build teh application and deploy teh application in Web sphere.
  • Prepared test plans and writing test cases.

Environment: Java, J2EE, Struts, Hibernate, JSP, Servlets, HTML, CSS, UML, Log4J, XML Schema, JUNIT, Tomcat, JavaScript, Oracle 9i, Unix, Eclipse IDE..

We'd love your feedback!