We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

0/5 (Submit Your Rating)

Little Rock, AR

SUMMARY

  • 6+ years of experience in all phases of Software development cycle, expertizing in Big Data Technologies likeHadoopand Spark Ecosystem and Java/J2EE technologies.
  • Expertise in Big Data technologies like Hadoop, Map Reduce, Yarn, Flume, Hive, Pig, Sqoop, H Base, Pivotal, Cloudera, Map R, Avro, Spark and Scala.
  • Proficient in Hadoop Architecture such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
  • Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Efficient in Hive data warehouse tool creating tables, data distributing by implementing Partitioning and Bucketing strategy, writing and optimizing the Hive QL queries.
  • Experience in extending Pig and Hive functionalities with custom UDFs for analysis of data, file processing, by running Pig Latin Scripts.
  • Experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Big Data including Apache Spark, Spark SQL and Spark Streaming.
  • Experience on importing and exporting the data from RDBMS databases MySQL, Oracle and DB2 into Hadoop data lake using SQOOP jobs.
  • Experience in developing Sqoop jobs in incremental mode both in append and last updated mode. Developing Sqoop merge scripts to handle incremental updates.
  • Proficient in developing Web based user interfaces using HTML5, CSS3, JavaScript, jQuery, AJAX, XML, JSON, jQuery UI, Bootstrap, Angular, Node JS, Ext JS.
  • Expertise on working with various databases in writing SQL queries, Stored Procedures, functions and Triggers by using PL/SQL and SQL.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Working knowledge of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Strong Experience in troubleshooting the operating system like Linux, RedHat, and UNIX, maintaining the cluster issues and java related bugs.

TECHNICAL SKILLS

Big Data: HDFS, MapReduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Impala, Presto, Oozie, Zookeeper, Spark.

Languages: Java, Python, Scala, C/C++, go.

Java Technologies: JSE, Servlets, JavaBeans, JSP, JDBC, JNDI, AJAX, EJB, Struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Web Design: HTML, DHTML, AJAX, JavaScript, jQuery, CSS, Angular, PHP

Build Tools: Eclipse, Jenkins, Git, Ant, Maven, IntelliJ, JUNIT and log4J.

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

RDBMS: Teradata, Oracle, MS SQL Server, MySQL

PROFESSIONAL EXPERIENCE

Confidential - Little Rock, AR

Sr. Hadoop Developer

Responsibilities:

  • Worked with application teams to install operating system and perform Hadoop updates, patches and version upgrades as required.
  • Developed MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and pre-processing.
  • Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
  • Extracted real-time data using Kafka and Spark streaming and process stream of RDD created from micro batches.
  • Used Spark-Streaming APIs to perform transformations and actions which gets the data from Kafka in near real-time and persists into HBase.
  • Wrote Hive queries for transformations on the data to be used by downstream models.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to Hive tables.
  • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Configured Spark streaming to get streaming info from Kafka & store them in HDFS.
  • Developed Pig Latin scripts to load data from output files and put to HDFS.
  • Used Oozie Workflow engine to run multiple Hive and Pig jobs.
  • Responsible for Performance Tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and Memory tuning.
  • Involved in deploying the applications in AWS and maintain the EC2 and RDS (Relational Database Services).
  • Created detailed AWS Security groups which behaved as virtual firewalls that controlled the traffic allowed reaching one or more AWS EC2 instances.
  • Created data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.

Confidential - Gaithersburg, MD

Hadoop Developer

Responsibilities:

  • Monitored systems, services and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Developed highly optimized Spark applications to perform data cleansing, validation, transformation and summarization activities
  • Created Sqoop jobs to handle incremental loads from RDBMS into HDFS to apply Spark Transformations and Actions.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Worked with Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs.
  • Handled large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations.
  • Migrated data from MySQL server toHadoopusing Sqoop for processing data.
  • Imported required tables from RDBMS to HDFS using Sqoop and used Spark and Kafka to get real time streaming of data into HBase.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework and handled JSON data.
  • Worked with ELK Stack cluster for importing logs into Logstash, sending them to Elasticsearch nodes and creating visualizations in Kibana.
  • Used Apache Spark with ELK cluster for obtaining some specific visualization which require more complex data processing.
  • Involved in moving log files generated from various sources to HDFS for further processing through Flume.
  • Worked with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Worked on partitioning the Hive table and running the scripts in parallel to reduce the run time of the scripts.
  • Analyzed data by performing Hive queries & running Pig scripts to know user behavior.
  • Programmed Pig scripts with complex joins like replicated and skewed to achieve better performance.

Confidential - San Ramon, CA

Hadoop Developer

Responsibilities:

  • Developed custom data Ingestion adapters to extract the log data and click stream data from external systems and load into HDFS.
  • Used Spark as ETL tool to do complex Transformations, De-Normalization, Enrichment and some pre-aggregations.
  • Worked on migrating MapReduce programs into Spark transformations using Scala.
  • Scripted complex HiveQL queries on Hive tables to analyze large datasets and wrote complex Hive UDFs to work with sequence files.
  • Scheduled workflows using Oozie to automate multiple Hive and Pig jobs, which run independently with time and data availability.
  • Created components like Hive UDFs for missing functionality in Hive to analyze and process large volumes of data extracted from No-SQL database - MongoDB.
  • Used Impala to read, write and query the Hadoop data in HDFS from MongoDB and configured Kafka to read and write messages from external programs.
  • Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
  • Collecting and aggregating substantial amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Worked on optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for an HDFS cluster.
  • Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.
  • Developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Implemented MapReduce programs to handle unstructured data like XML, JSON, and sequence files for log files.

Confidential - Omaha, NE

Java Developer

Responsibilities:

  • Involved in Requirement Gathering, Design and Deployment of the application using Scrum (Agile) as Development methodology.
  • Participated in developing use cases, use case diagrams, class diagrams, sequence diagrams and high-level activity diagrams using UML from the requirements.
  • Involved in the implementation of DAO using Spring-Hibernate ORM and creating the Objects and mapped using Hibernate Annotations.
  • Implemented the caching mechanism in Hibernate to load data from Oracle database.
  • Bottle micro-framework implemented with REST API and Oracle as back end database.
  • Used CXF API & JAX-RS technologies to develop REST & SOAP based Web services.
  • Used SOAP UI to test both REST as well as SOAP based web services.
  • Used XML based web services tool to push pending orders in Integration Manager.
  • Used JavaScript and jQuery for validating the input given to the user interface.
  • Used Ajax to update part of webpage which improved performance of the application.
  • Developed layout of Web Pages using Tiles and CSS.
  • Used SQL developer database tool to build, edit, and format database queries, as well as eliminate performance issues in the code.
  • Used Oracle database for tables creation and involved in writing SQL queries using Joins and Stored Procedures.
  • Performed unit testing using J-Unit framework and used Struts Test Cases for testing Action Classes.
  • Used ANT scripts to build application and deployed on WebSphere Application Server.
  • Executed the test steps defined in Test Cases manually and reporting the bugs in JIRA.

We'd love your feedback!