We provide IT Staff Augmentation Services!

Sr. Hadoop Developer/big Data Engineer Resume

5.00/5 (Submit Your Rating)

Durham, NC

SUMMARY:

  • IT Professional with 8+ years of hands - on experience in Big Data/Hadoop ecosystem tools, new Hadoop 2.0 Architecture YARN and developing YARN Applications on it.
  • Good experience in processing Unstructured, Semi-structured and Structured data.
  • Thorough understanding of the HDFS, Map Reduce framework and extensive experience in developing Map Reduce Jobs
  • Experience in building highly scalable BigData solutions using Hadoop and multiple distributions i.e., Cloudera, Horton Works and NoSQL platforms.
  • Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
  • Good working experience in PySpark and Spark SQL.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache Cloudera distributions, Horton works and amazon web services (AWS), EMR, EC2.
  • Hands-on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, Pig, Sqoop, Flume, Scala, Kafka & knowledge of Mapper/Reduce/HDFS Framework.
  • Experience in Loading Tuple shaped data into Pig and Generate Normal Data into Tuples. Ability to build User-Defined Functionalities (UDFs) not available in core Hadoop.
  • Ability to build deployment on AWS, build scripts (Boto 3 & AWS CLI) and automate solutions using Shell and Python.
  • Ability to move the data in and out of Hadoop RDBMS, No-SQL, UNIX and Mainframe from various systems using SQOOP and other traditional data movement technologies.
  • Good experienced with HBase Schema design.
  • Good knowledge on TDD and JENKINS.
  • Experience in Hadoop Distributions like Cloudera, Horton Works, Big Insights, MapR Windows Azure, and Impala. Hands-on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning).
  • Experience in using NIFI processor groups, processors and concepts on process flow management.
  • Hands on experience in Data Warehousing to design and loading tables with large data, and can develop the enterprise levels of data.
  • Good understanding of Scrum methodologies, Test Driven Development and Continuous Integration.

TECHNICAL SKILLS:

Big Data Ecosystems: Hadoop, Map Reduce, HDFS, HBase, Hive, Pig, Sqoop, Spark, Scala, Storm, Kafka, Oozie, Mongo DB, Cassandra

Languages: C, Core Java, UNIX, SQL, Python, R, C#, Haskell, Scala

J2EE Technologies: Servlets, JSP, JDBC, Java Beans, Jenkins, Git

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)

Monitoring and Reporting: Ganglia, Nagios, Custom Shell Scripts

NoSQL Technologies: Cassandra, Mongo DB, Neo4j, HBase

Frameworks: MVC, Struts, Hibernate, Spring

Operating Systems: Windows XP/Vista/7, UNIX

Web Servers: WebLogic, WebSphere, Apache Tomcat

PROFESSIONAL EXPERIENCE:

Confidential, Durham, NC

Sr. Hadoop Developer/Big Data Engineer

Roles & Responsibilities:

  • As a Lead of Data Services team, built Hadoop cluster on Azure HD Insight Platform and deployed Data analytic solutions using tools like Spark and BI reporting tools.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked on SQL using the ZEPPELIN tool on the various systems within AAP to understand the data and harmonization issues.
  • Worked on installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster using Cloudera and Horton works distributions.
  • Involved in creating Hive tables, loading data and writing hive queries.
  • Wrote and implemented custom Hive UDF's in Python as per business requirements
  • Worked on Hive performance tuning and Hadoop Map Reduce operation optimization.
  • Worked on MapR platform administration like installing cluster, commissioning & decommissioning of Data nodes, capacity planning and slot configuration.
  • Worked on troubleshooting the data harmonization issues in Enterprise Service Bus (ESB), Cloud Solution), On-Premise data store (Example: Oracle RDBMS)
  • Worked on data quality and data risk management including understanding of standards, methods, processes, tools, and controls to manage enterprise-wide.
  • Worked on converting the data quality issues into a solution, resolve the data quality problems through the appropriate choice of error detection and correction, process control and improvement, or process design strategies collaborating with subject matter experts and data stewards.
  • Monitored scorecard process/execution and provides feedback to the Data Governance Office (DGO).
  • Implementation and deployment of SQL Server Parallel Data Warehouse (PDW), HDInsight
  • Provided and designed DB tools to assist in the database management, transactions and processing environments.
  • Provided technical support for SQL database environment by overseeing databases development and organization.
  • Monitored response of database system for user queries and making necessary changes in scripting.

Confidential, Basking Ridge, NJ

Sr. Hadoop, Spark Developer/Analyst

Roles & Responsibilities:

  • Worked on loading disparate data sets coming from different sources to BDpaas (HADOOP) environment using Spark.
  • Developed UNIX scripts in creating Batch load for bringing huge amount of data from Relational databases to BIGDATA platform.
  • Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to the Data Lake.
  • Involved in configuring batch job to perform ingestion of the source files in to the Data Lake.
  • Developed Pig queries to load data to HBase.
  • Leveraged Hive queries to create ORC tables.
  • Developed HIVE scripts for analyst requirements for analysis.
  • Worked extensively on Hive to create, alter and drop tables and involved in writing hive queries.
  • Created and altered HBase tables on top of data residing in Data Lake.
  • Created Views from Hive Tables on top of data residing in Data Lake.
  • Involved in requirement and design phase to implement Streaming Architecture to use real time streaming using Spark and Kafka.
  • Created Reports with different Selection Criteria from Hive Tables on the data residing in Data Lake.
  • Worked closely with scrum master and team to gather information and perform daily activities.
  • Deployed Hadoop components on the Cluster like Hive, HBase, Spark, Scala and others with respect to the requirement.
  • Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop.
  • Implemented the Business Rules in Spark/ SCALA to get the business logic in place to run the Rating Engine.
  • Used Spark UI to observe the running of a submitted Spark Job at the node level.
  • Used Spark to do Property Bag Parsing of the data to get the required fields of data.
  • Created external Hive tables on the Blobs to showcase the data to the Hive Meta Store.
  • Used both Hive context as well as SQL context of Spark to do the initial testing of the Spark job.
  • Used Microsoft Visio to design the complex working structure in a diagrammatic representation.
  • Used WINSCP and FTP to view the data storage structure in the server and to upload JARs which were used to do the Spark Submit.
  • Developed code from scratch in Spark using SCALA according to the technical requirements.

Environment: Hadoop, Map Reduce, Yarn, Hive, Pig, HBase, Sqoop, Spark, Scala, MapR, Core Java, R Language, SQL, Python, Eclipse, Linux, Unix

Confidential, NYC, NY

Sr. Hadoop Developer / BigData Admin

Roles & Responsibilities:

  • Worked on importing and exporting data into HDFS and Hive using Sqoop and Kafka.
  • Involved in developing different components of system like Hadoop process involves Map Reduce &Hive.
  • Defined and build job flows and data-pipelines.
  • Develop interface validation process to provide validates incoming data arrival in Hadoop HDFS before kicking off Hadoop process.
  • Responsible for Cluster maintenance, adding and decommissioning the data nodes.
  • Worked extensively with Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • Used Scala Collection Framework to store and process the complex consumer information.
  • Implemented AWS provides a variety of computing and networking services to meet the needs of applications.
  • Worked on Elastic search 2.x which supports the SPARK features.
  • Wrote Hive Queries using optimized ways like using window functions, customizing Hadoop shuffle & sort parameters, ORC file format.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, and EMR.
  • Worked in tuning Hive and Pig to improve performance and solved performance issues in Hive and Pig Scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.
  • Developed Map Reduce programs using Combiners, Sequence Files, Compression techniques, Chained Jobs, multiple input and output API.
  • Cluster monitoring and troubleshooting and review and manage data backups and also manage and review Hadoop log files.
  • Worked on Node tools which offers a number of commands to return Cassandra metrics pertaining disk usage.
  • Worked on Amazon EMR processes data across a Hadoop Cluster of viral servers on Amazon Elastic Computing Cloud (EC2).
  • Worked on AWS Management Console to browse the Graphical User interface (GUI) for Amazon Web Services (AWS).
  • The logs and semi structured content that are stored on HDFS were preprocessed using PIG and the processed data is imported into Hive warehouse which enabled business analysts to write Hive queries.
  • Worked on Unix Shell Scripts for business process and loading data from different interfaces to HDFS
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
  • Involved in creating Hive tables, loading with data and writing hive queries.
  • Extensively used Eclipse, VPN, Putty, win SCP, VNC viewer, etc.

Environment: RedHat Enterprise Linux 5, Jenkins, Hadoop 1.0.4, Map Reduce, Hive 0.10, PIG, Shell Script, SQOOP 1.4.3, Eclipse, Java SDK 1.6

Confidential, Mahwah, NJ

Hadoop Developer/BigData Engineer

Roles & Responsibilities:

  • Developed Map Reduce program to convert mainframe fixed length data to delimited data.
  • Used Pig Latin to apply transaction on systems of record.
  • Worked on Hadoop Cluster monitoring tools like Nagios, Ganglia, and Cloudera Manager.
  • Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, and CDH4.x.
  • Worked on harmonizing Elastic search with Spark RDD and Data frames across ES and HDFS with Hive.
  • Developed Pig scripts and UDFs extensively for Value Added Processing (VAPs).
  • Design and developed custom Avro storage to use in Pig Latin to load and store data.
  • Worked on Cassandra which has Hadoop integration with Map reduce support.
  • Actively involved in design analysis, coding and strategy development.
  • Developed SQOOP commands to pull data from Tera data and push to HDFS.
  • Developed Hive scripts for implementing dynamic partitions and buckets for retail history data.
  • Streamlined Hadoop jobs and workflow operations using Oozie workflow and scheduled through Auto Sys on a monthly basis.
  • Developed Map Reduce to generate sequence id in Hadoop.
  • Worked on HBase Shell, CQL, HBase API and Cassandra Hector API as part of the proof of concept.
  • Developed Pig scripts to convert the data from Avro to text file format.
  • Developed Pig scripts and UDF's as per the business rules.
  • Developed Oozie workflows and they are scheduled through Auto Sys on monthly basis
  • Designed and developed read lock capability in HDFS.
  • Developed Hive scripts for implementing control tables logic in HDFS.
  • Developed NDM scripts to pull data from the Mainframe.
  • End-to-end implementation with Avro and Snappy.
  • Provided production support in my initial stages for the product which is already developed.
  • Created POC for Flume implementation.
  • Helped other teams to get started with the Hadoop ecosystem.

Client: Confidential - Raleigh, NC

Java/J2EE Developer

Roles & Responsibilities:

  • Understanding the business requirements and developed code for module of the application.
  • Developed the application based on MVC Architecture, and implemented Action classes.
  • Implemented Model Classes and Struts2 tags as views.
  • Implemented mapping files for corresponding tables using Hibernate 3.0 in developing the Project.
  • Involved in Adding Server-side Validations.
  • Created unit test case documents.
  • Developed business components to process requests from the user and used the Hibernate to retrieve and update patient information.
  • Worked with database Objects using TOAD and SQL Navigator for development and administration of various relational databases.
  • Wrote and used Java Bean classes, JSP, Stored Procedures and JSP custom tags in the web tier to dynamically generate web pages.

Environment: Java5, Struts2.x, Hibernate3.x, Oracle, JSP, JBoss, SVN, Jenkins, Eclipse HTML

Confidential

Java/J2EE Developer

Roles & Responsibilities:

  • Utilized Agile Methodologies to manage full life-cycle development of the project.
  • Implemented MVC design pattern using Struts Framework.
  • Form classes of Struts Framework to write the routing logic and to call different services.
  • Created tile definitions, Struts-config files, validation files and resource bundles for all modules using Struts framework.
  • Developed web application using JSP custom tag libraries, Struts Action classes and Action.
  • Designed Java Servlets and Objects using J2EE standards.
  • Used JSP for presentation layer, developed high performance object/relational persistence and query service for entire application utilizing Hibernate.
  • Developed the XML Schema and Web services for the data maintenance and structures.
  • Developed the application using Java Beans, Servlets and EJB’s.
  • Created Stateless Session EJB’s for retrieving data and Entity Beans for maintaining User Profile.
  • Used Web Sphere Application Server and RAD to develop and deploy the application.
  • Worked with various Style Sheets like Cascading Style Sheets (CSS).
  • Designed database and created tables, written the complex SQL Queries and stored procedures as per the requirements.
  • Involved in coding for JUnit Test cases, ANT for building the application.

Environment: Java/J2EE, Oracle 10g, SQL, PL/SQL, JSP, EJB, Struts, Hibernate, Web Logic, HTML, AJAX, Java Script, Jenkins, Git, JDBC, XML, JMS, XSLT, UML, Junit

Confidential

Java Developer

Roles & Responsibilities:

  • Primary responsibilities included the development of the code using core Java and web development skills.
  • Used Struts and JavaScript for web page development and front-end validations
  • Fetch and process customer related data using Mercator (IBM WTX) as interface between Confidential workstation with Mainframes
  • Created Servlets, JSPs and used JUnit framework for unit testing.
  • Developed EJBs, DAOs, stored Procedures and SQL queries to support system functionality.
  • Application design and documentation UML system use cases, class, sequence diagrams developed using MS Visio.
  • Use Ant scripts to automate application build and deployment processes.
  • Support Production/Stage application defects, track and document using Quality Center.
  • Implemented various Unix Shell Scripts as per the internal standards.

Environment: Java 1.4.2, Struts 1.2, Java script, JDBC, CVS, Eclipse, Web logic Server 9.1, Oracle 9i, Toad, Linux

We'd love your feedback!