We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • 8+ years of professional IT work experience in Analysis, Design, Administration, Development, Deployment and Maintenance of critical software and bigdata applications.
  • Over 3+ years of experience in BigData platform as bothDeveloper and Administrator.
  • Hands on experience in developing and deploying enterprise based applications using major Hadoop ecosystem components like Map Reduce, YARN, Hive, Pig, HBase, Flume, Sqoop, SparkStreaming, SparkSQL, Storm, Kafka, Oozieand Cassandra.
  • Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS.
  • Exposure to administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig
  • Installed and configured multiple Hadoop clusters of different sizes and with ecosystem components like Pig, Hive, Sqoop, Flume, HBase, Oozie and Zookeeper.
  • Worked on all major distributions of HadoopClouderaandHortonworks.
  • Responsible for designing and building a DataLake using Hadoop and its ecosystem components.
  • Handled Data Movement, data transformation, Analysis and visualization across the lake by integrating it with various tools.
  • Defined extract - translate-load (ETL) and extract-load-translate (ELT) processes for the Data Lake.
  • Good Expertise in Planning, Installing and Configuring HadoopCluster based on the business needs.
  • Good experience in working with cloud environment like AmazonWebServices (AWS)EC2 and S3
  • Transformed and aggregated data for analysis by implementing work flow management of Sqoop, Hive and Pig scripts.
  • Experience working on different file formats like Avro, Parquet, ORC, Sequence and Compression techniques like Gzip, Lzo, and Snappy in Hadoop.
  • Experience in retrieving data from databases like MYSQL, Teradata, Informix, DB2 and Oracle into HDFS using Sqoop and ingesting them into HBase and Cassandra.
  • Experience writing Oozie workflows and Job Controllers for job automation.
  • Integrated Oozie with Hue and scheduled workflows for multiple Hive, Pig and Spark Jobs.
  • In-Depth knowledge of Scala and Experience building Spark applications using Scala.
  • Good experience working on Tableau and Spotfire and enabled the JDBC/ODBCdata connectivity from those to Hive tables.
  • Designed neat and insightful dashboards in Tableau.
  • Have worked and designed on array of reports which includes Crosstab, Chart, Drill-Down, Drill-Through, Customer-Segment, and Geodemographic segmentation.
  • Deep understanding of Tableau features such as site and serveradministration, Calculatedfields, Tablecalculations, Parameters, Filter's (Normalandquick), highlighting, Levelofdetail, Granularity, Aggregation, Reference line and manymore.
  • Adequate knowledge of Scrum, Agile and Waterfall methodologies.
  • Designed and developed multiple J2EEModel 2 MVC based Web Application using J2EE.
  • Worked on various Tools and IDEs like Eclipse, IBMRational, ApacheAnt-BuildTool, MS-Office, PLSQLDeveloper, and SQLPlus.
  • Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of profession.

TECHNICAL SKILLS

Big DataTechnologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and HBase,Spark

Programming Languages: Java (5, 6, 7),Python,Scala, C/C++, XML Shell scripting, COBOL

Databases/RDBMS: MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle …

Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell, XML, J query, AJAX

ETL Tools: Cassandra, HBASE,ELASTIC SEARCH, Alteryx.

Operating Systems: Linux, Windows XP/7/8

Software Life Cycles: SDLC, Waterfall and Agile models

Office Tools: MS-Office,MS-Project and Risk Analysis tools, Visio

Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SOAP UI, ANT, Maven, Automation and MR-Unit

Cloud Platforms: Amazon EC2

Version Control: CVS, Tortoise SVN

Visualization Tools: Tableau.

Servers IBM: WebSphere, WebLogic, Tomcat, and Red hat Satellite Server

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

  • Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BIteam.
  • Extensively used Hive/HQL orHive queries to query or search for a particular string in Hive tables in HDFS.
  • Possess good Linux and HadoopSystemAdministrationskills, networking, shellscripting and familiarity with open source configuration management and deployment tools such as Chef.
  • Worked with Puppet for application deployment
  • Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Created HBase tables to store various data formats of data coming from different sources.
  • Use Maven to build and deploy code in Yarncluster
  • Good knowledge on building Apachespark applications using Scala.
  • Developed several business services using Java RESTfulWebServices using SpringMVC framework
  • Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
  • Used Apache Oozie for scheduling and managing the Hadoop Jobs. Knowledge on HCatalog for Hadoop based storage management.
  • Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm-kafka
  • Used Flume extensively in gathering and moving log data files from ApplicationServers to a central location in Hadoop Distributed File System (HDFS).
  • Implemented test scripts to support test driven development and continuous integration.
  • Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP
  • Responsible to manage data coming from different sources.
  • Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
  • Used File System check (FSCK) to check the health of files in HDFS.
  • Developed the UNIX shell scripts for creating the reports from Hivedata.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)
  • Involved in the pilot of Hadoop cluster hosted on AmazonWebServices (AWS)
  • Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
  • Create a complete processing engine, based on Cloudera' s distribution
  • Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
  • Extracted files from CouchDB, MongoDB through Sqoop and placed in HDFS for processed
  • Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
  • Configured Kerberos for the clusters

Environment: Hadoop, Map Reduce, HDFS, Ambari, Hive, Sqoop, Apache Kafka, Oozie, SQL, Alteryx, Flume, Spark, Cassandra, Scala, Java, AWS, GitHub.

Confidential, Dearborn, Michigan

Hadoop Data Analyst

Responsibilities:

  • Worked on cloud platform which was built with a scalable distributed data solution using Hadoop on a 40-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
  • Worked on analyzingHadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
  • Designing and implementing semi-structured data analytics platform leveraging Hadoop.
  • Worked on performance analysis and improvements for Hiveand Pig scripts at MapReduce job tuning level.
  • Involved in Optimization of HiveQueries.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Involved in Data Ingestion to HDFS from various data sources.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Extensively used ApacheSqoop for efficiently transferring bulk data between Apache Hadoop and relational databases.
  • Automated Sqoop, hive and pig jobs using Oozie scheduling.
  • Extensive knowledge in NoSQL databases like HBase
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Responsible for continuous monitoring and managing Elastic MapReduce (EMR) cluster through AWS console.
  • Have good knowledge on writing and using the user defined functions in HIVE, PIG and MapReduce.
  • Helped business team by installing and configuring Hadoop ecosystem components along with Hadoop admin.
  • Developed multiple Kafka Producers and Consumers from scratch as per the business requirements.
  • Worked on loading log data into HDFS through Flume
  • Created and maintained technical documentation for executing Hive queries and PigScripts.
  • Worked on debugging and performance tuning of Hive&Pigjobs.
  • Used Oozie to schedule various jobs on Hadoopcluster.
  • Used Hive to analyses the partitioned and bucketed data.
  • Worked on establishing connectivity between Tableau andHive.

Environment: Hortonworks 2.4, Hadoop, HDFS, Map Reduce, Mongo DB, Java, VMware, HIVE, Eclipse, PIG, Hive, HBase, AWS, Tableau, Sqoop, Flume, Linux, UNIX

Confidential, IL

Hadoop Developer /Admin

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop components.
  • Solid Understanding of HadoopHDFS, Map-Reduce and other Eco-System Projects
  • Installation and Configuration of Hadoop Cluster
  • Working with Cloudera Support Team to Fine tune Cluster
  • Working Closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources
  • Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform
  • Plugin allows HadoopMapReduce programs, HBase, PigandHiveto work unmodified and access files directly. The plugin also provided data locality for Hadoop across host nodes and virtual machines
  • Developed Map Reduce programs in Java for parsing the raw data and populating staging tables
  • Developed map Reduce jobs to analyze data and provide heuristics reports
  • Good experience in writing data ingesters and complex MapReduce jobs in java for data cleaning and preprocessing and fine tuning them as per data sets
  • Extensive data validation usingHIVE and also written Hive UDFs
  • Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way lots of scripting (python and shell) to provision and spin up virtualized Hadoop clusters
  • Adding, Decommissioning and rebalancing nodes
  • Created POC to store Server Log data into Cassandra to identify System Alert Metrics
  • Rack Aware Configuration
  • Configuring Client Machines
  • Configuring, Monitoring and Management Tools
  • HDFSSupport and Maintenance, Cluster HA Setup
  • Applying Patches and Perform Version Upgrades
  • Incident Management, Problem Management, Performance Management and Reporting
  • Recover from Name Node failures
  • Schedule Map Reduce Jobs -FIFO and FAIR share
  • Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop
  • Integration with RDBMS using Sqoop and JDBC Connectors
  • Working with Dev Team to tune Job Knowledge of Writing Hive Jobs

Environment: Windows 7, UNIX, Linux Java, Apache HDFS Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Cassandra, NOSQL

Confidential, CA

Hadoop Developer

Responsibilities:

  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Developed HiveUDF's to bring all the customers email id into a structured format.
  • Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
  • Using Sqoop to load data from DB2 into HBASE environment.
  • Inserted Overwriting the HIVE data with Hbase data daily to get fresh data every day.
  • All the bash scripts are scheduled using Resource Manager Scheduler.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Developed Pig scripts to transform the data into structured format and it are automated through oozie coordinators.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop
  • Developed Hivequeries for Analysis across different banners.

Environment: Windows 7, Hadoop, HDFS, MapReduce, Sqoop, Hive, Pig, Hbase, Teradata, DB2, Oozie, MySQL, Eclipse

Confidential

Hadoop/Java Developer

Responsibilities:

  • Involved in analysis, design and development of Expense Processing system.
  • Designed Use Case Diagrams, Class Diagrams and Sequence Diagrams and Object Diagrams to model the detail design of the application using UML.
  • Installed, configuring, and administrating Hadoop cluster of major Hadoop distributions.
  • Written MapReduce jobs in Java, Pig and Python.
  • Extensively worked with workflow/schedulers like Oozie and Scripting using Unix Shell Script, Python, and Perl.
  • Worked with SQL and NoSQL (MongoDB, Cassandra, Hadoop) data structures
  • Managing and reviewing Hadoop log files
  • Running Hadoop streaming jobs to process terabytes of xml format data
  • Worked on Hadoop Cluster migrations or Upgrades
  • Extensively worked with Cloudera Hadoop distribution components and custom packages
  • Build Reporting using Tableau
  • Applied ETL principles and best practices
  • Developed the application using Spring MVC Framework. Performed Client side validations using Angular JavaScript& Node JavaScript
  • Developed user interface using JSP, HTML, CSS and Java Script to simplify the complexities of the application.
  • Used AJAX Framework for Dynamic Searching of Bill Expense Information.
  • Created dynamic end to end REST API with Loopback-Node JS Framework.
  • Configured the spring framework for the entire business logic layer.
  • Developed code using various patterns like Singleton, Front Controller, Adapter, DAO, MVC, Template, Builder and Factory Patterns
  • Used Table per hierarchy inheritance of hibernates and mapped polymorphic associations.
  • Developed one-to-many, many-to-one, one-to-one annotation based mappings in Hibernate.
  • Developed DAO service methods to populate the domain model objects using Hibernate.
  • Used Spring Frame work's Bean Factory for initializing services.
  • Used Java collections API extensively such as List, Sets and Maps.
  • Wrote DAO classes using spring and Hibernate to interact with database for persistence.
  • Used Apache Log4J for logging and debugging.
  • Used Hibernate in data access layer to access and update information in the database.
  • Followed TDD and developed test cases using JUnit for all the modules developed.
  • Used Log4J to capture the log that includes runtime exceptions, monitored error logs and fixed the problems.
  • Created Maven build file to build the application and deployed on WebSphere Application Server

Environment: Java, Struts, Hibernate ORM, Loop Back Framework, Spring Application Framework, EJB, JSP, Servlets, JMS, XML, SOAP, WSDL, JDBC, JavaScript, UML, HTML, Angular JS, Node JS, JNDI, Subversion (SVN), Maven, Log4J, Spring Source Tool Suite (STS), Windows XP, Web Sphere App server, Oracle.

Confidential 

JAVA Developer

Responsibilities:

  • Collecting and understanding the User requirements and Functional specifications.
  • Development of GUI Using HTML, CSS, JSP and JavaScript.
  • Creating components for isolated business logic.
  • Deployment of application in J2EE Architecture.
  • Implemented Session Facade Pattern using Session and Entity Beans
  • Developed message driven beans to listen to JMS.
  • Developed the Web Interface using Servlets, Java Server Pages, HTML and CSS.
  • Used WebLogic to deploy applications on local and development environments of the application.
  • Extensively used the JDBC Prepared Statement to embed the SQL queries into the java code.
  • Developed DAO (Data Access Objects) using Spring Framework 3.
  • Developed Web applications with Rich Internet applications using Java applets, Silverlight, Java.
  • Used JavaScript to perform client side validations and Struts-Validator Framework for server-side validation.
  • Provided on call support based on the priority of the issues.

Environment: Java, J2EE, JDBC, JSP, Struts, JMS, spring, SQL, MS-Access, JavaScript, HTML

We'd love your feedback!