We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Bothell, WA

PROFESSIONAL SUMMARY:

  • 7 years of professional experience in Design and Development of Java, J2EE and Big Data technologies in depth understanding of Hadoop Distributed Architecture and its various components such as Node Manager, Resource Manager, Name Node, Data Node, Hive Server2, HBase Master, Region Server etc.,
  • Strong experience developing end - to-end data transformations using Spark Core API.
  • Strong experience creating real time data streaming solutions using Spark streaming and Kafka.
  • Worked extensively on fine tuning spark applications and worked with various memory settings in spark.
  • Strong Knowledge for real time processing using Apache Strom.
  • Developed Simple to complex Map/Reduce jobs using Java.
  • Expertise in writing end to end Data processing Jobs to analyze data using MapReduce, Spark and Hive.
  • Experience in Kafka for collecting, aggregating and moving huge chunks of data from various sources such as web server, telnet sources etc.
  • Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Developed Hive and Pig scripts for handling business transformations and analyzing data.
  • Developed Sqoop scripts for large dataset transfer between Hadoop and RDBMs.
  • Experience using Hortonworks Distributions to fully implement and leverage new Hadoop features.
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
  • Strong experience in working with UNIX/LINUX environments, writing shell scripts.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using design patterns.
  • Sound knowledge of J2EE architecture, design patterns, objects modeling using various J2EE technologies and frameworks.
  • Comprehensive experience in building Web-based applications using J2EE Frame works like spring, Hibernate, Struts and JMS.
  • Good working experience in design and application development using IDE's like Eclipse, Net Beans.
  • Experience in writing test cases in Java Environment using JUnit.
  • Experience in building, deploying and integrating applications with ANT, Maven.
  • Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
  • Good team player with ability to solve problems, organize and prioritize multiple tasks.
  • Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills.

TECHNICAL SKILLS:

Data Warehousing: Informatica Power Center, Power Connect, Power Exchange, Informatica PowerMart, Informatica Web services, Informatica MDM 10.1/9.X, Oracle Data.

Business Tools: MS Access, Tableau.

Big Data: Hadoop, Map Reduce 1.0/2.0, Pig, Hive, Hbase, Sqoop, Flume, oozie, spark,Cassandra.

Databases and Related Tools: MySQL, Oracle 10g/9i/8i/8/7.x, Teradata, PL/SQL, Hive, HDFS, TOAD 8.5.1/7.5/6.2.

Languages: Java / J2EE, SQL, JDBC.

Operating System: Mac OS, Unix, Linux (Various Versions), Windows 2003/7/8/8.1/XP

Web Development: HTML

Application Server: Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans

PROFESSIONAL EXPERIENCE:

Confidential, Bothell, WA

Sr. Hadoop Developer

Responsibilities:

  • Developed the services to run the Map-Reduce jobs as per the requirement basis.
  • Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
  • Responsible to manage data coming from different sources.
  • Built data pipeline using Pig and Java/Scala Map Reduce to store onto HDFS.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop.
  • Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Implemented the workflows using Apache oozie framework to automate tasks.
  • Developing design documents considering all possible approaches and identifying best of them.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and scala.
  • Involved in gathering the requirements, designing, development and testing.
  • Experienced with NoSQL databases like HBase, Cassandra, MongoDB and wrote Storm topology to accept the events from Kafka producer and emit into HBase and cassandra.
  • Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive (Hadoop) tables.
  • Ability to work with onsite and offshore team members.
  • Able to work on own initiative, highly proactive, self-motivated commitment towards work and resourceful.
  • Strong debugging and critical thinking ability with good understanding of frameworks advancement in methodologies and strategies.

Environment: Cloudera, MySQL, Apache HBase, HDFS, MapReduce, Hive, PIG, Sqoop, SQL, Windows, Linux.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
  • Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
  • Creating Hive tables and working on them using Hive QL.
  • Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
  • Worked on Cluster co-ordination services through Zookeeper.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
  • Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
  • Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.

Environment: Hadoop, HDFS MapReduce, HiveQL, Pig, Hbase, Sqoop, Oozie, Maven, Shell Scripting, CDH, Windows, Linux.

Confidential, Mellon, NYC

Hadoop Developer

Responsibilities:

  • Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Zookeeper and Sqoop.
  • Configured, designed implemented and monitored Kafka cluster and connectors.
  • Implemented a proof of concept (Poc's) using Kafka, Strom, Hbase for processing streaming data.
  • Used Sqoop to import data into HDFS and Hive from multiple data systems.
  • Developed complex queries using HIVE and IMPALA.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Handled importing of data from various data sources, performed transformations using Hive, Mapreduce, and Loaded data into HDFS.
  • Helped with the sizing and performance tuning of the Cassandra cluster.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's.
  • Used Hive on Tez to transform the data as per business requirements for batch processing.
  • Developed multiple Poc's using Spark and deployed on the Yarn cluster, compared the performance of Spark, with Cassandra and SQL.
  • Involved in the process of Cassandra data modelling and building efficient data structures.
  • Analyzed the Cassandra/SQL scripts and designed the solution
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping
  • Configured Ooozie workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Optimized Mapreduce code, pig scripts and performance tuning and analysis.
  • Implemented advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Developed Spark Application by using Python (Pyspark)
  • Exported the aggregated data onto Oracle using Sqoop for reporting on the Tableau dashboard.
  • Involvement in design, development and testing phases of Software Development Life Cycle.
  • Performed Hadoop installation, updates, patches and version upgrades when required.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
  • Storage on AWS EBS, S3 and Glacier and automate sync data to Glacier. Databases services on AWS like RDS, Dynamo DB, Elastic Transcoder, Cloud front, Elastic Beanstalk. Migration of 2 instances from one region to another.
  • Leveraged AWS cloud services such as EC2, auto-scaling and VPC (Virtual Private Cloud) to build secure, highly scalable and flexible systems that handled expected and unexpected load bursts.
  • Designed and implementation of secure Hadoop cluster using Kerberos.
  • Drove the application from development phase to production phase using Continuous Integration and Continuous Deployment (CI/CD) model using Chef, Maven and Jenkins.

Environment : Hadoop, MapReduce, HDFS, Hive, Pig, Ooozie, Java, Eclipse, Cloudera, Cassandra, AWS Oracle 10g, 11g, Flume, Kafka, Flume, Scala, Spark, Sqoop, Python, Kerberos, GIT, Jenkins, Chef, Maven, Hortonworks, Windows, Linux.

Confidential

Software Developer

Responsibilities:

  • Coding of the presentation layer using Java, JSP, Spring Framework, and JavaScript .
  • Developed components for Business Layer using Java Technologies.
  • Implemented Validation Framework using AJAX DWR .
  • Designing Class diagrams, Sequence diagrams, and activity diagrams.
  • CodecoverageandbranchcoverageoftheapplicationwithJUnittestscriptsforunit testing.
  • Fixing Check style Errors, CPD, PMD defects.
  • Support the Integration Testing phase with fixes for the defects.
  • Implementedloggingfacilityintheincentivesapplicationtoincreasetheperformanceand productivity
  • Maintain the incentives application build, deployment, integration, system test.
  • Developing validation framework Using Ajax framework.

Environment: Java, Oracle, DB2, spring, Tomcat, Servlets, JSP, Java Script

Confidential

Software Developer

Responsibilities:

  • The System is designed using J2EE technologies based on the MVC architecture.
  • The application uses the STRUTS framework. The views are programmed using JSP pages with the Struts tag library, Modelisa combination of EJB's and Java classes (Form and Action classes) and Controllers are Action Servlets.
  • Form level validations are provided using struts validation framework.
  • Used JSP's and Action Servlets for server side transactions.
  • JSP are used to communicate with EJB's. EJB as a middleware in designing and developing a three-tier application
  • The processed data is transferred to the database through persistent bean (cmp).

Environment: JSF, Struts, Hibernate, Tomcat 5.x (development), Bea Weblogic 8.1(production and deployment)

We'd love your feedback!