We provide IT Staff Augmentation Services!

Hadoop Developer Resume

New York City, NY

SUMMARY:

  • I have 8 years of IT experience in various domains with Big Data (Hadoop Eco Systems technologies), Core java and SQL&PL/SQL Technologies with hands - on project experience in various Verticals which includes financial services, Health Care and trade compliance.

TECHNICAL SKILLS:

Big Data/Hadoop Technologies: HDFS, YARN, Map Reduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Zookeeper, Oozie, Elastic Search

Hadoop Distribution/Monitoring: Cloudera, Hortonworks, Ambari, Cloudera Manager

NO SQL Databases: HBase, Cassandra, MongoDBRelational

Databases: Microsoft SQL Server, MySQL, Oracle, DB2

Languages: Java, Scala, SQL, PL/SQL, C, C++, Shell Scripting, Python

Java & J2EE Technologies: Core Java, JSP, Servlets, JDBC,JNDI, Hibernate, Spring, Struts, JMS, EJB, RESTful, SOAP

Web Technologies: HTML, CSS, XML, Java Script, JQuery

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat

Amazon AWS: EC2, S3, IAM, Glacier, CloudFront, EMR

Operating Systems: UNIX, Windows, LINUX

Build Tools: Jenkins, Maven, ANT

ETL Tools: Infomatica, Talend

Visualization: Tableau

Development Tools: Eclipse, NetBeans, IntelliJ

Development Methodologies: Agile, Waterfall

Version Tools and Testing API: Git, SVN and JUNIT

PROFESSIONAL EXPERIENCE:

Confidential, New York City, NY

Hadoop Developer

Responsibilities:

  • Worked on Hortonworks Data Platform (HDP).
  • Created Data Lake by extracting customer’s data from various data sources. This includes data from RDBMS, CSV and Excel.
  • Involved in design and development of Data transformation framework components to support ETL process, which gets the Single Complete Actionable View of a customer.
  • Developed an ingestion module to ingest data into HDFS from heterogeneous data sources.
  • Used Apache Hive to run map reduce jobs on top of this HDFS data.
  • Built distributed in-memory applications using Spark and Spark SQL to do analytics efficiently on huge data sets.
  • Efficiently used spark transformation and actions to build simple/ quick and complex ETL applications.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Load real time data into HDFS using Kafka and structured batch data using Sqoop.
  • Involved in converting Hive/SQL queries into Spark transformations using SparkRDD's and Scala.
  • Developed Spark scripts by using Scala shell commands as per the requirement.

Environment: Hadoop 2.6, Spark 1.6, Hive 1.1.0, Hbase 1.2, Scala, HDFS, MapReduce, Ambari, MySQL, SQL, GitHub, Linux, Spark SQL, Kafka, Sqoop 1.46, AWS (S3).

Confidential - Southlake, TX

Hadoop/Spark Developer

Responsibilities:

  • Worked in Ingesting flat files from local Unix file systems to HDFS and using Sqoop ingested structured data from legacy RDBMS systems to HDFS.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns, developing predictive analytic using Apache Spark Scala APIs.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop and Worked with Spark, Scala and Python.
  • Coordinating with the Data science team in creating PySpark jobs.
  • Writing Hive join query to fetch info from multiple tables and collect output from Hive Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Utilizing Oozie workflow scheduler to run Hive Jobs Extracted files through Sqoop and placed in HDFS and processed.

Environment: Hadoop, Sqoop, Hive, Spark, HDFS, Scala, Python, Spark SQL, JDBC, Kafka

Confidential, Cincinnati, OH

Big Data Developer/Engineer

Responsibilities:

  • Used Spark API using Scala over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Create a Hadoop design which replicates the Current system design.
  • Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark for Data aggregation and queries.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Developed Hive queries to pre-process the data required for running the business process.
  • Create the Main upload files from the Hive Temporary Tables.
  • Actively involved in design analysis, coding and strategy development.
  • Developed Hive scripts for implementing dynamic partitions and buckets for history data.
  • Developed Spark scripts by using Scala per the requirement to read/write JSON files.
  • Involve in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.

Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, HBase, Flume, Spark, Spark-Streaming, MapReduce, Kafka, AWS, Tableau 8, Apache

Confidential, New Orleans, LA

Big Data Developer

Responsibilities:

  • Ingested Batch Files into HDFS using shell scripting.
  • Used flume to ingest near-real-time data and perform necessary transformations and aggregations on the fly and persisted the data in Hive.
  • Used Hadoop's Pig, Hive and Map Reduce for analyzing the data and to help by extract data sets for meaningful information.
  • Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as removing irrelevant information or merging many small files into a handful of very large, compressed files using pig pipelines in the data preparation stage.
  • Extensively used PIG to communicate with Hive using HCatalog.
  • Implemented exception tracking logic using Pig scripts.
  • Saved the analyzed data to the Hive Tables for visualization and to generate reports for the BI team.
  • Good understanding of ETL tools and how the ETL operations can be applied in a Big Data environment.

Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Oozie, Core Java, Python, Eclipse, Flume, Cloudera, Oracle, UNIX Shell Scripting.

Confidential

Java Developer

Responsibilities:

  • Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
  • Designed the user interfaces using JSPs, developed custom tags, and used JSTL Tag lib.
  • Developed various java business classes for handling different functions.
  • Developed controller classes using Struts and tiles API.
  • Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use Case Transaction diagrams.
  • Participated in design and code reviews
  • Developed User Interface using AJAX in JSP and performed client-side validation
  • Developed JUnit test cases for all the developed modules. Used SVN as version control

Environment: Java, Struts 1.2, Hibernate 3.0, JSP, JavaScript, HTML, XML, Oracle, Eclipse, JBoss.

Hire Now