We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Austin, TX

SUMMARY

  • 7 years of professional experience working with data, which includes hands on experience of 5+ years in analysis, design, development and maintenance of Hadoop.
  • Extensive experience of development using Hadoop ecosystem covering Map Reduce, HDFS, YARN, Hive, Impala, Pig, Hbase, Spark, Sqoop, Oozie, Cloudera.
  • Full - scale knowledge of Hadoop ecosystem components such as HDFS, Job Tracker, Name Node, Data Node.
  • Experience with an in-depth level of understanding in the strategy and practical implementation of AWS Cloud-Specific technologies including IAM, EC2, EMR, SNS, RDS, Redshift, Athena, Dynamo DB, Lambda, Cloud Watch, Auto-Scaling, S3, and Route 53.
  • Strong experience in analyzing data using HiveQL, Spark SQL, HBase and custom Map Reduce programs.
  • Performed importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in writing shell scripts to dump the Shared data from MySQL servers to HDFS.
  • Working knowledge in python and Scala to use spark.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Handled different file formats like Parquet, Avro files, RC files using different SerDes in Hive.
  • Performed Data Ingestion from multiple disparate sources and systems using Kafka.
  • Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
  • Experience in writing custom UDF’s which extends Hive and Pig core functionalities
  • Good knowledge of No-SQL databases Cassandra, MongoDB and HBase .
  • Worked on HBase to load and retrieve data for real time processing using Rest API.
  • Experience in developing applications using waterfall and Agile ( XP and Scrum ).
  • Experienced with build tool ANT, Maven and continuous integrations like Jenkins.
  • Proficiency in working with databases like Oracle, MySQL.
  • Extensive experience in writing stored procedures and functions using SQL and PL/SQL.
  • Experience in Amazon AWS cloud Administration and actively involved highly available, Scalability, cost effective and fault tolerant systems using multiple AWS services.
  • Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
  • Strong problem-solving, organizing, team management, communication and planning skills, with ability to work in team environment. Ability to write clear, well-documented, well-commented and efficient code as per the requirement.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, HBase, Impala, Sqoop, Flume, NoSQL (HBase, Cassandra), Spark, Kafka, Zookeeper, Oozie, Hue, Cloudera Manager, Amazon AWS, Hortonwork clusters

AWS Ecosystems: S3, EC2, EMR, Redshift, Athena, Glue, Lambda, SNS, CloudWatch

Java/J2EE & Web Technologies : J2EE, JMS, JSF, JDBC, Servlets, HTML, CSS, XML, XHTML, AJAX, JavaScript.

Languages: C, C++, Core Java, Shell Scripting, SQL, PL/SQL, Python, Pig Latin

Operating systems: Windows, Linux and Unix

DBMS / RDBMS: Oracle, Talend ETL, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, MongoDB, Cassandra, HBase

IDE and Build Tools: Eclipse, NetBeans, MS Visual Studio, Ant, Maven, JIRA, Confluence

Version Control: Git, SVN, CVS

Web Services: RESTful, SOAP

Web Servers: Web Logic, Web Sphere, Apache Tomcat

PROFESSIONAL EXPERIENCE

Senior Hadoop Developer

Confidential, Austin, TX

Responsibilities:

  • Created and maintained Technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
  • Developed data pipeline using Spark, Hive, Pig, python, Impala and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
  • Have done monitoring and reviewing Hadoop log files and written queries to analyze them.
  • Conducted POC's and mocks with client to understand the Business requirement, also attended defect triage meeting with UAT team and QA team to ensure defects are resolved in timely manner.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MRv2, HIVE, SQOOP and PigLatin.
  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop, Flume and load into Hive tables, which are partitioned.
  • Developed HiveSQL queries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
  • Written complex queries to get the data into HBase and responsible for executing hive queries using Hive Command Line, HUE.
  • Designed and implemented proprietary data solutions by correlating data from SQL and NoSQL databases using Kafka.
  • Used Pig as ETL tool to do transformations and some pre-aggregations before storing the analyzed data into HDFS.
  • Developed a PySpark code for saving data into AVRO and Parquet format and building hive tables on top of them.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Automated workflows using shell scripts to pull data from various data bases into Hadoop.
  • Developed bashscripts to bring the TLog files from ftp server and then processing it to load into hive tables. All the bash scripts are scheduled using Resource Manager Scheduler.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Developed spark programs using Scala, involved in creating SparkSQL Queries and developed Oozie workflow for spark jobs.

Environment: HDFS, Hadoop 2.x YARN, Teradata, NoSQL, PySpark, Map Reduce, pig, Hive, Sqoop, Spark 2.3, Scala, Oozie, Java, Python, MongoDB, Shell and bash Scripting.

Hadoop Developer

Confidential, Irving, TX

Responsibilities:

  • Worked with Sqoop jobs with incremental load to populate HAWQ External tables to Internal table.
  • Importing and exporting data jobs, to perform operations like copying data from HDFS and to HDFS using Sqoop.
  • Worked with Spark core, Spark Streaming and SQL modules of Spark.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked with Pig, HBase, NoSQL database HBase and Sqoop, for analyzing the Hadoop cluster as well as big data.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Creating Hive tables and working on them for data analysis in order to meet the business requirements.
  • Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
  • Worked on data cleansing in order to populate into hive external table and internal tables.
  • Experience in using Sequence files, RCFile, AVRO and HAR file formats.
  • Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
  • Supporting and building the Data Science team projects on to Hadoop.
  • Used FLUME to dump the application server logs into HDFS.
  • Automating backups by shell for Linux to transfer data in S3 bucket.
  • Experience in working with NoSQL database HBASE in getting real time data analytics.
  • Hands on experience working as production support Engineer.
  • Worked on RCA documentation.
  • Automated incremental loads to load data into production cluster.
  • Ingested the data from various file system to HDFS using Unix command line utilities.
  • Hands on experience in moving data from one cluster to another cluster using DISTCP.
  • Experience in reviewing Hadoop log files to detect failures.
  • Worked on EPIC user stories and delivered on time.
  • Worked on data ingestion part for malicious intent model. Automated daily incremental jobs that can run on daily basis.
  • Hands on experience in Agile and scrum methodologies.

Environment: MapReduce, HDFS, Hive, HBase, Sqoop, Pig, Flume, Oracle 11/10g, DB2, Teradata, MySQL, HAWQ, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, HP ALM.

Hadoop Developer

Confidential, Charlotte, NC

Responsibilities:

  • Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
  • Hands on experience in loading data from UNIX file system and Teradata to HDFS.
  • Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Install, configure, and operate Apache stack i.e. Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Flume and Mahout on Hadoop cluster.
  • Created a high-level design for the Data Ingestion and data extraction Module, enhancement of Hadoop Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Have been working with AWS cloud services (VPC, EC2, S3, EMR, DynamoDB,SNS, SQS).
  • Have been a part of team that has taken care of setting the infrastructure in AWS.
  • Used AmazonS3 as a storage mechanism and written python scripts that dump the data into S3.
  • Tested raw data and executed performance scripts.
  • Worked with NoSQL database HBase to create tables and store data.
  • Developed and involved in the industry specific UDF (user defined functions)
  • Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map reduce, Hive, Pig, and Sqoop.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.

Environment: Cloudera, MapReduce, Hive, Solr, Pig, HDFS, HBase, Flume, Sqoop, Zookeeper, Python, Flat files, AWS, Unix/Linux.

Hadoop Developer

Confidential

Responsibilities:

  • Installed and configured Hadoop MapReduce , HDFS , Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Written MapReduce code to process & parsing the data from various sources & storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Worked on moving all log files generated from various sources to HDFS for further processing.
  • Developed workflows using custom MapReduce , Pig, Hive and Sqoop .
  • Developing predictive analytic product for using Apache Spark, SQL/HiveQL.
  • Writing Spark programs to load, parse, refined and store sensor data into Hadoop and also process analyzed and aggregate data for visualizations.
  • Creating various views for HBASE tables and also utilizing the performance of Hive on top of HBASE .
  • Developed the Apache Storm , Kafka , and HDFS integration project to do a real time data analysis.
  • Designed and developed the Apache Storm topologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.
  • Developed Map Reduce program for parsing and loading into HDFS information.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying .
  • Written Hive UDF to sort Structure fields and return complex data type.
  • Responsible for loading data from UNIX file system to HDFS .
  • Developed ETL Applications using HIVE, SPARK , IMPALA & SQOOP and Automated using Oozie
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
  • Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
  • Developed workflow in Control M to automate tasks of loading data into HDFS and pre-processing with PIG .
  • Cluster co-ordination services through ZooKeeper.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster .
  • Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.

Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Oozie, Sqoop, Aws, Spark, Unix, Tableau, Cosmos.

Hadoop Developer

Confidential

Responsibilities:

  • Experienced with Python frameworks like WPebapp2 and, Flask .
  • Experienced in WAMP (Windows, Apache, MYSQL, and Python PHP) and MVC Struts
  • Developed mobile cross-browser web application Angular JS , JavaScript API .
  • Successfully migrated the Django database from SQLite to MySQL to PostgreSQ L with complete data integrity.
  • Used Celery with Rabbit MQ and Flask to create a distributed worker framework.
  • Created Automation test framework using Selenium .
  • Responsible for design and development of Web Pages using PHP, HTML , JOOMLA, CSS including Ajax controls and XML.
  • Developed intranet portal for managing Amazon EC2 servers using Tornado and MongoDB.
  • Expertise in developing different web applications implementing the Model-View-Controller (MVC) architectures using Full stack frameworks such as Turbo Gears.
  • Implemented monitoring and established best practices around using Elastic search.
  • Strong experience in building large, responsive based REST web application experienced in Cherrypy framework, Python.
  • Used Test driven approach (TDD) for developing services required for the application.

Environment: Python 2.7/3.0, PL/SQL C++, Redshift, XML, Agile (SCRUM), PyUnit, MYSQL, Apache, CSS, MySQL, DHTML, HTML, JavaScript, Shell Scripts, Git, Linux, Unix and Windows.

We'd love your feedback!