We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY:

  • Over 13+ year of hands on IT development and BI solution architecture. Of which
  • 4+ years of experience in handling big data sources (query hybrid, unstructured, columnar and array - based data sources ) and associated Hadoop ecosystem like MapReduce, HDFS, Pig, Hive, other NoSQL databases( HBase, Cassandra ),Sqoop,
  • Oozie, Kafka, Flume, Spark and Zookeeper.
  • 6 + years of experience in Data Warehouse/Business Intelligence technologies stack including SAP Business Objects 3.X/4.X, IBM Cognos 10.2, QlikView and Informatics.
  • 3+ years of experience in the design and development of enterprise applications using Java technologies.
  • Experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Ranger, Oozie, Hive, Sqoop, Pig, Zookeeper, Spark, Kafka, Spark Streaming and Flume.
  • Hands on experience in working with Hadoop Clusters using Hortonworks and Cloudera.
  • Experience in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting using Cloudera Manager, Nagios.
  • Developed data ingestion framework using Sqoop/Scala/NDM/custom scripts.
  • Good working experience with various SDLC methodologies using both Agile and waterfall Model.
  • Experience in developing PigLatin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions ( UDFs ), UDTF for data specific processing.
  • Developing and implementing MapReduce using JAVA and Spark JAVA to process and perform various analytics on large datasets
  • Good experience in job scheduling and monitoring using Oozie Workflow,Autosys and Control-M.
  • Optimized the performance of the existing ETL scripts using Spark and Spark SQL.
  • Experienced in Streaming the Data to HDFS using Spark Streaming from Flume/Kafka Sources.
  • Functional Working Knowledge on IBM MQ & Apache Kafka
  • Migrated the existing Map reduce/HQL queries into Spark Transformations using Spark RDDs, DataFrames and Scala.
  • Experienced in Spark optimization improvement of the existing algorithms in Hadoop using Spark Context, Spark-SQL, DataFrame, Pair RDD's and YARN
  • Experienced in Spark Framework on both batch and real-time data processing
  • Experienced in writing Spark programs/application in Scala using Spark APIs for Data Extraction, Transformation and Aggregation
  • Experienced in automating Sqoop, Hive, Java, MapReduce,Shell scripting etc using Oozie workflow
  • Worked on Platform migration - Mainframe, Teradata system decommissioning, and brining entire data to HDFS
  • Experience in building, maintaining multiple Hadoop clusters (prod, dev. etc.,) of different sizes and configuration and setting up the rack topology for large clusters.
  • Experienced in automating most of the manual tasks using Shell and Python scripts.
  • Experienced in working with different file formats - Avro, Parquet, fixed length, EBCDIC, text file, XML, JSON, CSV
  • Strong knowledge in NoSQL column oriented databases like HBase, Cassandra and its integration with Hadoop cluster
  • Hands on experience on Amazon Web Services (AWS) components like Amazon EC2 instances, S3 buckets.
  • Extensive experience with Data Modeling in designing analytical / OLAP and transactional / OLTP databases using ERwin to design backend data models and entity relationship diagrams (ERDs) for star schemas, snowflake dimensions and fact tables.
  • Experience in ETL extraction, transformation and loading data from heterogeneous source system using Hadoop.

TECHNICAL SKILL:

Hadoop/Big Data: Map Reduce, HDFS, HBase, Hive, Impala, Pig, Sqoop, Flume, ZooKeeper, Flume, YARN, Oozie, HCatalog. AWS, HDP 2.1, Spark, Shark, CDH4, CDH5 and Hortonworks 2.3/2.2/2.1/2.0

BI Tools: SAP BI 4.x/ BO XI 3.x/ XI R2 Crystal Reports XI R2/ XI and COGNOS 10.1.2

ETL Tools: Informatica 8.x/ 7.x

Databases: IBM DB2, Oracle8i/9i/10g, MS SQL Server 2005/2008, MySQL, Cassandra, HBase, Mongo DB, Teradata and Netezza

Query Tools: DB2 OPTIM development Tool, Toad 9.x, Oracle SQL Developer 1.x and MS SQL Query Analyzer 8.x, TOAD

Languages: C,C++ and Java / J2EE, HTML, SQL

IDE: Eclipse, Intellij IDEA and Scala IDE

Scripting Languages: Shell, Perl and Python

Operating Systems: Windows 2003/2008/XP/Vista, Unix, Linux (Various Versions)

Version Control: Visual Source Safe and SVN

Scheduling Tools: Autosys and Control-M

Build & Deploy Tools: ANT, JENKINS, Maven & SBT

PROFESSIONAL EXPERIENCE:

Confidential

Charlotte, NC

Sr. Hadoop Developer

Responsibilities:

  • Involved in Requirement gathering, analysis with Upstream and downstream applications and prepared mapping document.
  • NDM Mainframe copybooks and source files into Hadoop Edge-nodes.
  • Created interface to convert mainframe data (EBCDIC) into ASCII.
  • Received kudos from business team for helping them reducing the query time by 60% using impala. By restructuring the application to convert from AVRO to PARQUET tables saving 30 to 40% in compression.
  • Received appreciation from technical leadership team for converting the existing DQ reports from HIVE/PIG to Spark SQL resulting in 80% gain in report generation time.
  • Worked with SQOOP import historical and incremental functionalities to handle large data set transfer between Teradata database and HDFS.
  • Worked in loading data into Hive tables and writing Hive adhoc queries that will run internally in MapReduce and different execution engines like Spark (Hive on Spark)
  • Ensure data quality and accuracy by implementing business and technical reconciliations via scripts and data analysis.
  • Created Oozie common templates and passing dynamic parameters in Oozie properties to work with various actions such as Hive, Pig, MapReduce Java, Shell and Spark.
  • Coordinated with multiple teams when working on common portfolio related tasks to avoid conflicts.
  • Parameterized the common properties as part of the code optimization and making it production ready.
  • Created data ingestion pipeline to work with real time streaming data using Kafka, Flume and Spark Steaming and publish the data to HDFS.
  • Published the HDFS/Hive table data to external system using the custom Kafka Producer for continuous updates.
  • As part of the security compliance masked/scrubbed the NPI data in historical data.
  • Populate the historical data fields by joining the different source tables.
  • Worked in different compression techniques like Gzip, LZO, Snappy and Bzip2
  • Created Java UDF, UDTF and UDAF to handle derived fields while inserting in to Hive tables and Pig scripts to format the data.
  • Worked with NDM, sFTP connection (passwordless SSH using RSA key) set-ups, and file transfers using it.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Used distcp to move data from one Hadoop cluster to new cluster for data processing and testing.
  • Involved in Unit testing and Performance Testing. Worked with TQMS team and resolved the defects.
  • Created Autosys JIL’s to see the upstream and downstream dependencies. Also created the Autosys JILs for the ETL processing.
  • Created reusable components, managed the environments, code migration & maintenance using SVN etc., as part of System Engineering team.
  • Created project design documents and performance metrics reports.
  • Worked on data validation and coordinated pre & post production deployment.
  • Served as on call support to solve the business user’s issues/queries.

Reusable Components Created:

  • Dustc: Data comparison and DI tool used between any RDBMS and HDFS using python.
  • Code Deployment: Automated code deployment tool used to deploy the code in UAT.
  • PMC data Copy: Tool to use to copy the data from prod to lower lanes for regression testing.
  • Test Data creation: Tool is used to create the test data for unit testing.
  • HDPF components: Developed plug and play components as part of internal open source software program using Spark for the following:
  • Data Sourcing, Data Integrity, Data Standardization, Merge& Flatten, Transformation and Publish
  • UDF & UDTF: Created multiple UDF’s and UDTF’s to handle complex transformation.

System Engineering:

  • Enforce the coding standards across the teams
  • Validated the correctness of the code using the Code Review Tool before submitting to the TSnC team for release approvals.
  • Worked on POC to convert an existing workflows to use the newly created component (HDPF).
  • Architectured to successful cluster migration from older system (HSP) to newer environment (HaaS).
  • Enabled access control on views and tables using sentry mapping.
  • Worked on a POC to track data lineage using ANTLR tool using machine learning.
  • On point for code deployment in UAT lanes and provide performance metrics to SA team.

Environment: CDH5.X, HDFS, HIVE, PIG, Sqoop, Oozie, Spark, Kafka, Flume, Kerberos, UNIX shell scripting, SuperPutty, Eclipse, UNIX, Version One, Autosys, NDM, Python, Maven and Jenkins

Confidential

Chicago, IL

Sr. Hadoop Developer/Admin

Responsibilities:

  • Evaluate business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper, Spark and Flume.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed large amounts of data set to determine the optimal way to aggregate and report on it.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Handled importing of data from different data sources performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Apache Sqoop.
  • Developed PIG scripts used for data analytics. Wrote Java UDFs and employed in Pig and Hive scripts.
  • Developed script to implement distributed cache technique to improve Hive loads.
  • Extensively worked on performance tuning of Hive SQL’s and Cluster performance.
  • Involved in Design, Development, Testing and Implementation of the ETL using Hadoop ecosystem.
  • Defined the job flow sequence and developed UNIX shell scripts to automate the execution of hive and pig scripts.
  • Hand on experience on cluster up gradation to latest HDP 2.2.4.2 stack from HDP 2.0 and HDP 2.2
  • Installed and configured the secured cluster using Kerberos.
  • Installed and configured the cluster to integrate with LDAP users for HUE access.
  • Worked with Pivotal HAWQ to have better performance to connecting to external BI tools
  • Data summarization and ad-hoc querying with Hive
  • Used Sqoop to load data from RDBMS (Teradata) to HDFS.
  • Loaded the dataset into Hive for ETL (Extract, Transfer and Load) operation.
  • Worked closely with data analysts to construct creative solutions for their analysis tasks.
  • Defined Oozie workflow to run DAGs of various Hive and Pig jobs which run freely with time and data availability.
  • Experience in creating the metastore on external databases such as ORACLE.
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Created several scripts in Python and Shell to automate the daily flow process for jobs.
  • Attend daily scrum meeting by providing what is done, raising help for any blocker / issues and what is going to do today.
  • Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
  • Experience in creating the cluster queues to run the jobs in cluster to smooth share of resources.

Environment: Hadoop, MapReduce, HiveQL, Pig, HBase, Sqoop, Falcon, Oozie, Knox, ranger Zookeeper, Spark, Nagios, Ganglia, HAWQ, HDP 2.3/2.2/2.1, Cognos 10.2, SVN,Teradata, UNIX, Suse Linux and Windows.

Confidential

Bloomington, IL

Sr. BI Developer/Hadoop Developer

Responsibilities:

  • Installed and configured live hadoop cluster running on Horton works distribution (HDP 2.1).
  • Developed the Pig UDF's to preprocess the data for analysis.
  • Developed Java MapReduce programs for the analysis of claims log file stored in cluster.
  • Extensively used Amazon S3 storage for logging mechanism and archiving of the data.
  • Installed HDP2.1 on the commodity servers and Amazon AWS servers.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Had experience in writing the Unix Shell Scripting and Python scripting for automate process.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Ingesting Transactional data in to HBase.
  • Created and Implemented highly scalable and reliable distributed data design using NoSQL HBase
  • Experience with Big Data Analytics implementations using Cloudera
  • Installed Hue, Oozie and Zookeeper on the hadoop cluster.
  • Developed and Published Packages using Framework Manager - Connecting and testing the datasource, created database layer, physical layer and presentation layer, defining appropriate relationship between query subjects and created filters, prompts, calculations, summaries and functions in Framework Manager and universe.
  • Constantly monitored Hadoop jobs, scheduling them in Oozie.
  • Interacted with users to know their Business views while gathering the Report requirements and provided
  • Several Report Mock-ups to finalize the requirements.
  • Developed Standard Templates, Dashboard Reports, Standard Reports, Charts, and Drill through
  • Reports, Master-detail & conditional formatting reports using Report Studio and Query Studio. Working experience with different date functions in Report studio.
  • Used Framework Manager to build models, packages and publish packages to Cognos Connection.
  • Developed the daily, weekly and monthly reporting using Cognos reporting tool from the HIVEJDBC.
  • The Hive tables created as per requirement were internal or external tables defined with proper static and dynamic partitions, intended for efficiency.
  • Using Core java Technologies to Manipulate the Data from HDFS and getting desired Results.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster
  • Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Using Services of Amazon Web Service(AWS) where more computing power is necessary.
  • Gained very good business knowledge on insurance, claim processing, fraud suspect identification
  • Using Oozie to schedule various jobs on Hadoop Cluster.

Environment: Hadoop, MapReduce, Hive, Impala, Pig, HBase, Sqoop, Flume, Oozie, Zookeeper, Spark, Nagios, CDH3/4, Amazon EC2, HDP 2.2, SAP Business Objects 3.1/4.1, Cognos 10.2, SVN,DB2, SQL Server, UNIX, Linux CentOS/Ubuntu and Windows.

Confidential

Baltimore, MD

Sr. Business Intelligence Consultant

Responsibilities:

  • Developing functionalities and services with technologies like Hibernate frameworks.
  • Involved in gathering, analyzing, and documenting business requirements and data specifications for
  • Datamarts, Universes and Reports.
  • Designed, developed and managed Datamarts and Business Objects Universes.
  • Analyzed Application/ Warehouse databases and developed Universes using the Designer module.
  • Created New Classes, Objects and Predefined Conditions as part of Universe development.
  • Defined Aliases and Contexts to resolve Join Problems (Loops, Chasm Traps and Fan Traps) in Universes.
  • Extensively used Functions, Calculations, Variables, Breaks, Sorts, Drill, Formulas, Cross Tab and Slice Dice for generating complex Reports.
  • Involved in creating the workflows using workflow framework, setting up milestones, statuses and states and assigning to Management team to tracks them.
  • Developed the bursting and archiving solution for BI reporting using JDK.
  • Development of java web services using SOAP to enable sharing of data.
  • Java multi-threaded environment is maintained to make best of the available resources.
  • Maintained source code versions and merging the code changes with ClearCase.
  • Extensively involved in JUNIT testing and coordinated with the testing team and fixed defects and tracked them using QC.
  • Developed projects using Ant to build and deploy.
  • Involved in installing Business Objects, creating Repository, Users and User Groups, and prioritizing
  • Security Levels for Users and User Groups.
  • Deployed and tested the application on UNIX based environments in test Servers.

Environment:: Business Objects 3.1, JAVA, Eclipse 3.5, Apache Tomcat, Hibernate, JUNIT,Edit Plus, QC, JUNIT, Log4J, Putty, UNIX, ClearCase

Environment: Java/J2EE, OOAD, Maven, Shell AJAX, Agile, WebServices, SOAP, WSDL, JSF, PL/SQL, XML, JDBC, JavaScript, HTML, Oracle 9i, UNIX, JUnit

Confidential

Sr. Software Engineer

Responsibilities:

  • Wrote and maintained the Software Requirement Specification (SRS) for the project
  • Designed UML diagrams using IBM Rational Rose 2001 EE,Borland Together, Magic Draw
  • Installed and Configured WebSphere Portal Server 5.1, WSAD 5.1 and Portal ToolKit 5.0.2 plug-in.
  • Provides work direction, tracks progress, and manages workload to other application developers as required.
  • Involved in implementation of the presentation layer (GUI) for the application using JSF, HTML, XHTML, CSS and JavaScript.
  • Configured faces-config.xml and applicationcontext.xml for JSF and Spring AOP Integration.
  • Created backing bean to define methods associated with components, input validations, event handling and navigation processing.
  • Designed and developed authentication and authorization framework using LDAP.
  • Developed build and deployment scripts using Maven to generate WAR, EAR and EJB.JAR files and store them on the repository, publish and deploy on the WebSphere.
  • Developed WebServices to communicate to other modules using XML based SOAP and WSDL protocols.
  • Modified the company’s WebSphere’s Portal themes and skins according to the portal requirements
  • Developed SQL queries to implement Struts frame work.
  • Successfully implemented the MVC architechture and Object Relational mapping is done using Hibernate
  • Developed the startup service interfaces required and run time service implementation classes.
  • Extensively used Struts tag libraries and jar files and Custom tags.
  • Involved in Unit & Integration Testing for all Facultative, Treaty & Wire Modules.
  • Wrote stored procedures using PL/SQL for data retrieval from different tables.
  • Configured Transactional and Security attributes for application wide Session Beans.
  • Developed logging service using Log4J.
  • Implemented the caching mechanism in Hibernate to load data from Oracle database.
  • Involved in fixing defects and troubleshooting issues on UNIX environment and wrote Shell scripts to automate jobs.
  • Accountable for the successful execution of all application development activities.
  • Wrote and maintained the Ant build script for the project

Environment: Java, JSP, Servlets, JavaScript, JDBC, IBM WebSphere 5.1 Application Server, WSAD, TOAD, Change Man, MS Windows 2000, LDAP, Oracle JTA, JMS and JNDI.

Confidential

Software Engineer

Responsibilities:

  • Analyzing and preparing the requirement Analysis Document.
  • Deploying the Application to the JBOSS Application Server.
  • Requirement gatherings from various parties involved in the project
  • Estimate timelines for development tasks.
  • Used to J2EE and EJB to handle the business flow and Functionality.
  • Interact with Client to get the confirmation on the functionalities.
  • Involved in the complete SDLC of the Development with full system dependency.
  • Actively coordinated with deployment manager for application production launch.
  • Provide Support and update for the period under warranty.
  • Performed Functional, User Interface test and Regression Test
  • Carrying out Regression testing to track the problem tracking.

Environment: Java, J2EE, EJB, UNIX, XML, Work Flow, JMS, JIRA, Oracle, JBOSS

We'd love your feedback!