We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

0/5 (Submit Your Rating)

Bellevue, WA

SUMMARY

  • 7 years of professional experience in IT, which includes over 4 years of experience in developing and administering Big Data Technologies and Hadoop Ecosystem.
  • Hands on experience in using Hadoop ecosystem components HIVE, PIG, SPARK, SQOOP, MAP REDUCE, IMPALA, FLUME, OOZIE, HBASE.
  • Efficient in Object Oriented Programming using Java and J2EE.
  • Experience in processing vast data sets by performing structural modifications, to cleanse both structured and semi - structured data using MapReduce programs in JAVA, HiveQL and Pig Latin.
  • Implementation of Spark by leveraging Interactive SQL queries for processing large volumes of data. Usage of Data Frames to access Hive, JSON, ORC, JDBC.
  • Expertise in optimization of mapreduce algorithms using Combiners, Partitioners and Distributed Cache to deliver best results.
  • Experince with Amazon S3 cloud storage - backing up, recovering and storing critical data
  • Solid Experience in optimizing the Hive queries using Partitioning and Bucketing.
  • Hands on experience in developing Pig Scripts to perform data transformation operations, by implementing various functions, for loading and evaluating data in the relations.
  • Experience in Importing and Exporting data from different databases like MySql, Oracle, MS SQL into HDFS and Hive using Sqoop.
  • Experience with NoSQL databases like - Hbase .
  • Loading log data into HDFS by collecting and aggregating the data from various sources using Flume.
  • Usage of Amazon EC2 to leverage web-scale cloud computing and virtual environment.
  • Good experience working with Horton works Distribution and Cloudera Distribution.
  • Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL features for data warehouses.
  • Experience in developing scripts using Shell for system management and for automating routine tasks.
  • Experience in integrating Hadoop with Infomatica and working on the pipelines for processing data.
  • Experience in public cloud environment - Amazon Web Services (AWS).
  • Hands on experience in dealing with Compression Codecs like Snappy, Gzip.
  • Experience with front end technologies HTML (5), CSS, JavaScript, XML and JQuery.
  • Worked extensively on different databases Oracle, MySQL, MS SQL and have good database programming experience with SQL.

TECHNICAL SKILLS

BIG Data: Apache Hadoop, YARN, HDFS, Map Reduce, Hive, Pig, Impala, Sqoop, Spark, Flume, Oozie, Hue, HBase

Programming Languages: Java, J2EE, Spring, Hibernate

Scripting: Shell Scripting

Web Technologies: HTML(5), CSS, JavaScript, XML, JQuery

Databases: Oracle, MySQL, MS SQL Server, MS-Access

Tools: & Technologies: Maven, Eclipse, JDeveloper, PuTTY

Reporting & ETL Tools: Tableau, Pentaho

Cloud Platforms: Amazon Web Services

Monitoring & Administration tools: Ambari, Cloudera Manager

Version Control: SVN, Gerrit, GitHub, VSS

Operating Systems: Linux, Windows, Mac

PROFESSIONAL EXPERIENCE

Confidential - Bellevue, WA

Hadoop/Big Data Developer

Responsibilities:

  • Active participation in designing the scope of jobs for each sprint delivery in the Agile development process.
  • Ingestion of huge volumes of data into Hadoop lake using different methodologies.
  • Performed Sqoop imports to load structural data from relational databases like Oracle, MySql, MS SQL into Hadoop HDFS. Worked with Free-form Query Imports, Parallelism and Incremental Imports.
  • Optimizing the Hive external tables by implementing Dynamic Partitioning and Bucketing for faster query response.
  • Utilization of Pig Scripts, UDFs to perform data transformations, pre-aggregations, pre-processing before storing the data onto HDFS.
  • Used Oozie as a workflow scheduler to manage Apache Hadoop Jobs.
  • Worked with parsing XML files using MapReduce to extract related attributes and store it in HDFS.
  • In corporation of DevOps structure for building, testing, and releasing software in a more rapid and frequent way.
  • Implemented a shell script that generates a timely report on the jobs that are running. and sends a status email to the group.
  • Implemented a Maven project to generate Hive table scripts by passing a .csv file with the DD information as input.
  • Processing of real-time feeds using HBASE (NOSQL).
  • Worked on a POC to implement Apache Spark framework using - Scala.
  • Implementation of Spark by leveraging Interactive SQL queries for processing large volumes of data. Usage of Data Frames to access Hive, JSON, ORC, JDBC.
  • Developed dispatch jobs to load data from Hadoop lake into Teradata and perform ETL on the core tables.
  • Experience with Apache Ambari for managing and monitoring Hadoop clusters.
  • Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Worked with Hue- Web Interface for monitoring the workflows, analyzing data, running queries, navigating to HDFS.
  • Implementation of Spool Scripts to extract data from DB sources in the form of flat files for ingestion into HDFS.
  • Usage of GitHub for maintaining the code and Jenkins for continuous integration and generating builds.
  • Effectively used Control-M Scheduler for scheduling the jobs.
  • Development of Ingestion Jobs, participation in Code Review, Deployment of jobs into production, Support and Maintenance after the release.
  • Interaction with source teams about the approach to load the data into Hadoop.

Environment: HDFS, Hortonworks, Oozie, Sqoop, Pig, Hive, Flume, HBase, Hue, Spark, Shell scripting, MapReduce, Eclipse, Control-M, Git, Jenkins, Gerrit

Confidential - Englewood, CO

Hadoop Developer

Responsibilities:

  • Actively participated with developing team to meet the specific customer requirements and proposed effective Hadoop solutions.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Responsible for collecting Data required for testing various Map Reduce applications from different sources.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Involved in creating Hive tables and loading and analyzing data by writing custom UDFs to process data in hive queries.
  • Created Impala tables for faster operations. Designed ETL jobs to identify and remove duplicate records using sort and remove duplicate stage and Generated Keys for the Unique records using Surrogate key Generator Stage.
  • Worked on Pentaho Data Integration tool to design the job work flows to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
  • Worked on developing sample scripts to test Spark and Spark-SQL functionality.
  • Developed the deployment packages using Jenkins with the help of version controls like SVN, GIT.
  • Effectively used Tidal Enterprise Scheduler to schedule the jobs.
  • Worked with team in installing and configuring the Hadoop Cluster with Ecosystem tools like Hive, Yarn, Spark, Sqoop, hbase, hcatalog, Pig.
  • Developed OpenStack Heat Templates in YAML to deploy and configure Hadoop cluster.
  • Configuring and using Apache Load balancers with mod proxy for the Pentaho BI reports to face the customers.

Environment: HDFS, Oozie, Sqoop, Pig, Hive, Flume, Shell scripting, MapReduce, Eclipse, OpenStack, Pentaho, Tidal, Git, SVN

Confidential - Rockville, MD

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop Ecosystem.
  • Responsible for writing Map Reduce jobs to handle files in multiple data formats (JSON, Text, XML).
  • Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to Hadoop Distributed File System.
  • Maintained and Supported Map Reduce Programs running on the cluster.
  • Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed usingHive inbuilt functions.
  • Worked with multiple file formats JSON, XML, Sequence Files and RC Files.
  • Deployed and configured Flume agents to stream log events into HDFS for analysis.
  • Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
  • Used Sqoop to efficiently transfer data between databases and HDFS.
  • Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Developing Scripts and Batch Jobs to schedule various Hadoop Programs.
  • Backed up data on regular basis to a remote cluster as well as to Amazon Cloud cluster using distcp.
  • Experience in installation of Pentaho PDI/BA servers in Prod Environment.

Environment: Hadoop, HDFS, MapReduce, Java, Hive, Pig, Sqoop, Flume, Oozie, Hue, Cloudera Manager, HBase, Tableau, Pentaho, CentOS, Windows, PuTTY, MySQL

Confidential - Somerset, NJ

Java Developer

Responsibilities:

  • Actively participated in requirements gathering, analysis, design, and testing phases.
  • Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase.
  • Developed the entire application implementing MVC Architecture integrating JSF with Hibernate and Spring frameworks.
  • Created and implemented storedprocedures, functions, triggers, using SQL.
  • Setting up client side validations using JavaScript.
  • Developed the Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
  • Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
  • Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.
  • Developed Web Services for data transfer from client to server and vice versa using Apache Axis and SOAP.
  • Implemented various J2EE Design patterns like Singleton, Service Locator, DAO, and SOA.
  • Worked on AJAX to develop an interactive Web Application and JavaScript for Data Validations.

Environment: J2EE, JDBC, Java 1.4, Servlets, JSP, Struts, Hibernate, Web services, SOAP, Design Patterns, MVC, HTML, JavaScript 1.2, WebLogic 8.0, XML, JUnit, Oracle 10g, Web Sphere, Eclipse.

Confidential

Java Developer

Responsibilities:

  • Involvement in analyzing user requirements, design of functional and technical specifications of the project.
  • Designed and developed user interface static and dynamic web pages using JSP, HTML and CSS.
  • Connected java web applications to MySQL by using JDBC connectors in Eclipse IDE.
  • Created and implemented storedprocedures, functions, triggers, using SQL.
  • Developed the application using Spring MVC framework.
  • Involved in the migration of independent parts of the system to use Hibernate for the implementation of DAO.
  • Involved in configuring and deploying the application on Tomcat Server.
  • Incorporated custom logging mechanism for tracing errors, resolving all issues and bugs before deploying the application.
  • Data Migration during product release and Software upgrades to latest versions.
  • UAT validation after the release and bug fixes.
  • Prepared technical reports & documentation manuals during the program development.

Environment: Java, J2EE, Spring, Hibernate, Eclipse, MVC, JUnit, JSP, DHTML, JavaScript, Ajax, Web Services, Tomcat, Rational Rose, SOAP, Windows, UNIX.

We'd love your feedback!