We provide IT Staff Augmentation Services!

Sr Hadoop Developer Resume

0/5 (Submit Your Rating)

Basking Ridge, NJ

SUMMARY

  • A Senior Big Data Engineer with a can - do attitude. The 7 years IT experience spans across large scale BIG data implementations using cutting edge technologies in healthcare, finance and insurance domain.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, Yarn, Hdfs, HBase, Hive, Sqoop, Pig, Oozie and Flume.
  • Good Knowledge on Hadoop Cluster architecture and experience in monitoring the cluster using Ambari.
  • Experience in handling real time streaming data using Spark Streaming, Apache Storm and Flume.
  • Experience in ingesting data into HDFS using Sqoop and IBM CDC component.
  • Worked on converting existing Hive QL code to run using Spark SQL and HiveContext.
  • Hands on experience with application development in Scala, Java, Python and Unix shell scripting.
  • Completed 1 major implementation on migrating mainframe tables into MapR Hadoop Datalake using IBM CDC, Pig, Hbase and Hive ORC solution.
  • Excellent knowledge and expertise of NoSQL databases like HBase, MongoDB, Marklogic, Cassandra and ElasticSearch.
  • Worked with converting files into xml and json documents for ingestion using Pig scripts.
  • Experience in working with Elastic Search, Kibana, Logstash stack to generate dashboards.
  • Extensive experience working in Oracle, TeraData, MS SQL Server and MySQL databases.
  • Experience in Data Migration from RDBMS to Hadoop components and vice versa.
  • Experience in setting up Splunk forwarders and building indexes, logging and monitoring dashboard building using Splunk.
  • Wrote custom UDF's in JAVA for extended Hive core functionality.
  • Good experience with writing Hive Queries along with optimization and performance tuning.
  • Good experience with Hive managed, external, partitioned, ORC format tables
  • Worked on BI Reporting tool Tableau and Kibana to present dashboards with business critical data in Reports.
  • Hands on experience in Integrating Tableau with the Hadoop stack and performing analytics.
  • Experience with scheduling automated jobs using Talend, Tivoli Workload Scheduler and Oozie.
  • Experience in managing code versions using GitHub.
  • Experience in Administering, Installation, configuration, troubleshooting, Security, Backup, Performance Monitoring and Fine tuning of Linux OS.
  • Experience in managing and reviewing Hadoop log files.
  • Integrated Kafka and Storm by using Avro for serializing and deserializing the data.
  • Experience in Talend for native map reduce code using Talend Enterprise tool.
  • Progressive experience in all phases of the iterative Software Development Life Cycle (SDLC).
  • Experience working in scrum teams as a part of Agile development environment.
  • Worked on Onsite-offshore model handling business and technical team communication.
  • Experience in preparing technical specification and design documents.
  • Ability to adapt to evolving technology, strong sense of responsibility and .
  • Strong communication and interpersonal skills.

TECHNICAL SKILLS

Big Data/Hadoop: Spark SQL, Spark Streaming, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Storm, Talend, Splunk, YARN.

Language: Java, SQL/PLSQL, Pig, Python, Scala, Unix Shell Scripting

Hadoop Distributions: MapR, Hortonworks, Cloudera

Methodologies: Agile, V-model, Waterfall model

RDBMS: Oracle, MySQL, MS SQL, DB2, Teradata, Natezza

NoSQL Databases: HBase, MongoDB, Marklogic, Cassandra.

Web Tools/Frameworks: ElasticSearch, Kibana, Logstash, HTML, Java Script, XML, ODBC, JDBC, Java Beans, MVC, Ajax, JSP, Servlets, Junit, J2EE 1.4, EJB 3.0, JSP 2.0, JavaMail 1.2, Spring, Hibernate 3.2, Struts 2.2, UML, Weblogic10.

Operating System: Linux RHEL, Ubuntu, CentOS, Fedora, Windows, Mac

BI Tool: Cognos, Tableau, Datameer, Kibana.

ETL Tool: Ab-Initio

PROFESSIONAL EXPERIENCE

Confidential, Basking Ridge, NJ

Sr Hadoop Developer

Responsibilities:

  • Work on gathering requirements from business and prepare rules that could be used to predict patients with a particular ailment.
  • Work with external sources on receiving claims file and the data dictionary.
  • Reformat the files received into standardized schema that can run against rules. Files were loaded in Hive external tables and all manipulation was done in hive.
  • Write rules in hive associated to various ailments and to generate summary tables.
  • Ingest the reports from hive into elastic search in json document form using pig scripts.
  • Provision and configure ElasticSearch nodes and create indexes and types.
  • Prepare Kibana Dashboard for business to view reports at member level.
  • Prototype the solution and architecture on how the data can be ingested on Hbase with CDC components for delta.
  • Setup IBM CDC with the help of IBM and generate the test CDC extracts from tables.
  • Map the extracts with tables schema and test the error scenarios.
  • Develop the pig scripts to read the CDC files and ingest them into hbase.
  • Setup hbase versions and column families based on requirements.
  • Develop shell script based framework with detailed logging and alerts to automate the entire ingestion process.
  • Setup groups and access management for hbase tables using maprcli.
  • Hive external table on top of Hbase were created with capability to handle delta.
  • Further to improve performance Hive ORC tables from the external tables were created These ORC tables were used to generate extracts which were earlier handled in mainframe.
  • Setup the automated schedules in Talend TAC.
  • Design and develop the ES index ( parent/child/routing)
  • Load data using pig and spark
  • Update/modify mapping and user testing.
  • Develop automated loading of 160+ tables.
  • Manage and deploy ES cluster
  • Kibana based reports/dashboards.
  • Provision MongoDB and setup users/collections.
  • Prepare sqoop queries and automate schedules to extract data on daily basis.
  • Wrote custom UDF to extend hive functionality.
  • Develop views and HQLs to prepare the summary reports capturing changes to coverage made by members.
  • Prepare Tableau Dashboard with drill down capabilities on various worksheets.
  • Publish reports in Tableau server and setup Tableau extract refresh schedules.

Environment: Hive, Sqoop, Tableau, Java, HDFS, Talend, UNIX shell scripting, Putty, WinSCP.

Confidential

Responsibilities:

  • Migrate the hive queries to run in spark using HiveContext.
  • Perform performance tuning to existing hqls to reduce run time.
  • Generate reports as csv files that were sent via SFTP to business NAS drives.

Environment: Spark SQL, HiveContext, Hive, HDFS, Talend, UNIX shell scripting, Putty, WinSCP.

Confidential, Rochester NY

Hadoop Developer

Responsibilities:

  • Worked on setting up of Hortonworks HDP2 Multi-node cluster.
  • Handled administration of 150 plus cluster using Ambari.
  • Worked on installing cluster nodes, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Consulted and onboarded new tenants on Big Data environment.
  • Setting up users, permissions and directory structure.
  • Managing of snapshots and backups.
  • Setting up Hortonworks client to connect with Talend.
  • Schedule and automate jobs using Talend admin console.
  • Setup of staging/landing node to receive data.
  • Involved in benchmark testing on new features.
  • Worked with Talend for native map reduce code using Talend Enterprise tool.
  • Worked on Setting up of MarkLogic cluster with 6 nodes.
  • Worked on db/forests/servers setup required for the project within Marklogic.
  • Loaded documents in MarkLogic using MLCP with custom URI and collections.
  • Performed transformations of data into json files using xquery and java script.
  • Logging and monitoring dashboard building using Splunk.
  • Writing searches using Splunk search language.
  • Worked on building index and summary indexes in Splunk.
  • Worked with Splunk server and look up tables.
  • Worked with Data Modelling in Splunk.
  • Involved in POC for processing real time data streaming data by implementing Apache Storm.
  • Setting up Splunk forwarders and perform troubleshooting.
  • Performed ingestion of data into HDFS and Hive Tables.
  • Preparing message queues using RabbitMQ for receiving data via queues.
  • Worked on Flume setup with source consisting of a RabbitMQ Server and the sink as the HDFS file system.

Environment: Apache Hadoop, Pig, Hive, HDFS, Map-reduce, Sqoop, Flume, Splunk, Marklogic, Talend, RabbitMQ, Ambari, UNIX shell scripting, ZooKeeper, Oozie, Apache Storm.

Confidential, Washington DC

Data Analyst

Responsibilities:

  • Gathered business requirements by conducting meetings with Business and preparing technical specifications.
  • Used a variety of tools including SQL, PL/SQL, Python scripts and Cognos to develop statistical models to predict customer churn. It included substantial data mining and extraction.
  • Provide adhoc profiling using SQL queries on DB2.
  • Createdatalineage anddatamapping documents.
  • Creatingdatadictionary for the existingdatamodels.
  • Worked on Use Cases andDatamapping documents for front end developers to map the fields.
  • Involved in Dimension modeling,datamodeling using Ab-Initio.
  • Developed Ab Initio graphs as per the business requirements to load data into the Teradata and DB2 database.
  • Wrote test plans to validate the graph results.
  • Worked on maintaining Data Quality, Source Systems Analysis, Business Rules Validation, Source Target Mapping Design, Performance Tuning and High Volume Data Loads.
  • Developed SQL Queries, Stored Procedures, Triggers, User Defined Functions (UDF), Views, Indexes as required for the ETL processes and Data Integration activities.
  • Worked on Job scheduling using Tivoli Workload Scheduler.
  • Extensively involved in writing UNIX shell scripts.
  • Involved in tuning of SQL, PLSQL code for optimizing performance.

Environment: Python, R, Excel, SQL, PLSQL, Cognos, TWS, Unix, Ab Initio GDE 1.14/1.15, Co-Operating System 2.14, DB2, Teradata, Crystal.

Confidential, Saint Petersburg, FL

Programmer Analyst

Responsibilities:

  • Gathered user requirements followed by analysis & design and evaluated various technologies for the Client.
  • Developed HTML, CSS and JSP to present Client side GUI.
  • Developed JavaScript code and used Jquery libraries for Validations.
  • Extensively used Spring and Hibernate framework to implement J2EE design patterns to develop the business modules based on the required functionality.
  • Used Java collections API extensively such as Lists, Sets and Maps.
  • Used Spring Framework for Dependency Injection and integrated with Hibernate.
  • Developed the application for automating internal activities of the organization using Spring MVC architecture and custom tags support custom User Interfaces.
  • Extensively used JavaMail for automatic emailing and JNDI to interact with the Knowledge Server.
  • Developed, Tested, and Debugged the Java, JSP, and EJB components using Eclipse.
  • Developed JSP as the view, Servlets as controller, and EJB as model in the Struts framework.
  • Coordinated with QA and business team to resolve system defects generated during testing.
  • Involved in fixing bugs and unit testing with test cases using JUnit.
  • Worked on WebLogic application server to deploy JSP and EJB applications.
  • Involved in designing the system with the help of Software Requirement Specification using MVC(model view controller) architecture & designed use case & class diagrams using UML.
  • Developed the front end for Emulator using thorough implementation of AJAX, GWT, Struts Tiles Plugin, Struts Validation Plugin, HTML and CSS. Added key functions and navigations on various clicks of Emulator using Struts actions and action forms.
  • Use of SH3 JVM for VHDL (VHSIC Hardware Description Language) mapping on Nokia’s Symbian OS. Used Log4J logging framework to write Log messages with various levels. Also used LogBack Eclipse Plugin to change the color of various log levels.

Environment: Java1.4, J2EE 1.4, EJB 3.0, JSP 2.0, Servlets 2.4, JNDI 1.2, JavaMail 1.2, JDBC 3.0, HTML, XML, Java Script, Eclipse 3.2, MySQL, JDK1.5, JSP, Struts 2.2, UML, Weblogic10, Spring, Hibernate 3.2, Junit4.1, UNIX.

We'd love your feedback!