We provide IT Staff Augmentation Services!

Hadoop Data Engineer Resume

NC

PROFESSIONAL SUMMARY:

  • Senior Software Engineer having 7+ years of professional IT experience with 3.5 years of expertise in Big Data Ecosystem in ingestion, storage, querying, processing and analysis of big data.
  • Extensive experience working in various key industry verticals including Finance, Insurance, Healthcare and Retail.
  • Excellent exposure in understanding Big Data business requirements and providing them Hadoop based solutions
  • Have great Interests in Devloping and implementing new concepts in Distributed Computing technologies, Hadoop, Map - Reduce and NoSQL databases.
  • Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Significant expertise in implementing real-time Big Data systems using Hadoop Ecosystem tools like Hadoop Map Reduce, Spark HDFS, HIVE, PIG, Hbase, Pentaho, Zookeeper, Sqoop, Kafka, Scala, Oozie, Flume.
  • Exposure of working on different big data distributions like Cloudera, Hortonworks, Apache etc.
  • Experience in using Amazon Web Services (AWS) cloud. Performed Export and import of data into S3 and Amazon Redshift database.
  • Expertise in designing scalable data stores using the Apache Cassandra NoSQL database.
  • Extensive experience in performing data processing and analysis using HiveQL, Pig Latin, custom Map Reduce programs in Java, python scripts in Spark and SparkSQL.
  • Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata, DB2 into HDFS and vice-versa using Sqoop.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experience in managing and reviewing Hadoop Log files.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Experience includes Requirements Gathering/Analysis, Design, Development, versioning, Integration, Documentation, Testing, Build and Deployment
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
  • Successfully loaded files to Hive and HDFS from HBase
  • Experience in supporting analysts by administering and configuring HIVE
  • Experience in providing support to data analyst in running Pig and Hive queries.
  • Good knowledge on Hadoop Cluster architecture and monitoring the cluster .
  • In-depth understanding of Data Structures and Algorithms.
  • Excellent shell scripts skills in Unix/Linux.
  • Excellent Java development skills using J2EE, J2SE, Junit, JSP, JDBC
  • Implemented Unit Testing using JUNIT testing and system testing during the projects.
  • Good experience in design the jobs and transformations and load the data sequentially & parallel for initial and incremental loads.
  • Always worked in teams and appreciated for collaboration and problem solving competencies.
  • Excellent verbal and written communication skills.
  • Well versed with project lifecycle documentation with specialization in development of robust reusable designs and patterns.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, MapReduce, Spark, HDFS, HBase, Cassandra, Sqoop, Pig, Hive, Oozie, Zookeeper, Flume, Kafka, AVRO, Parquet, Cloudera

Operating Systems: Windows, Linux, Unix

Programming/Scripting Languages: C, Java, Python, Shell, R, SQL

Cloud: AWS EC2, S3, Redshift

Database: AWS Redshift, MongoDb, Teradata, Oracle, DB2

Business Intelligence: Business Objects, Tableau

Tools: & Utilities: Eclipse, Netbeans, Git, Svn,Win SCP, Putty, Pentaho

Web technologies: HTML5, CSS, XML, Javascript

PROFESSIONAL EXPERIENCE:

Confidential, NC

Hadoop Data Engineer

Responsibilities:

  • Designed the process and the required environment to load the data into the On Premise Data Lake and a business specified subset to the AWS Redshift Database
  • Worked on the Data Ingestion framework to load data from the Edge Node to the appropriate directories in HDFS
  • Built scripts in python running Spark at the core to perform end to end data load and transformation, key components being, reconciliation for file validity, data quality and implementation of custom rules like filtering and transformation of data as per business requirements
  • Developed a Shell Script Master Wrapper to wrap the different code sets of the project, reducing multiple jobs to a single job schedule in Control M
  • Designed the shell script to be configurable to plug in and plug out reusable code components based on configuration parameters resulting in a reusable packaged framework
  • Incorporated logical exit points, error logging and restart ability in the shell script for a robust support and troubleshooting framework
  • Created Hive tables dynamically partitioned on Data Date to make the Raw data available to business users as a structure interface and storing files as Parquet with Snappy compression
  • Loaded subset of the data to AWS S3 and then AWS Redshift Database to provide an RDMS kind of business user experience
  • Took Infrastructure team help to understand AWS and get an EC2 instance running to push data from S3 to Redshift
  • Developed a Historical load strategy to load history data from the source system getting experience with Sqoop for load from Teradata to HDFS and Snowball for load to AWS Redshift
  • Used Flume to stream the log data from servers.
  • Played fungible roles in the spirit of agile with primary as developer and secondary as tester working on system testing and UAT
  • Leveraged the companies Dev Ops process for CICD (Continuous Integration Continuous Deployment) by storing code in the GitHub Code repository and leveraging Jenkins and U-Deploy utilities to promote to QA and Prod environments
  • Active participant in all other housekeeping activities like Metadata Registry, Design Documentation, Test Cases, User Stories for Agile Story Board, and all team activities/session

Environment: Hadoop, Spark, HDFS, Hive, CDH, Python, Sqoop, UNIX, AWS EC2, S3, Redshift, GitHub

Confidential, NJ

Hadoop Data Engineer

Responsibilities:

  • In conjunction to developer played an additional role of a Systems Integrator and captured requirements for the project and got involved with infrastructure setting up of EC2 instances to install Mongo drivers to extract data from the MongoDB
  • Got a detailed understanding of NoSQL databases with Mongo and understood the concept of key value pair storage
  • Researched on different ways to parse JSON data, the Mongo DB extract output.
  • Developed the process to extract from Mongo on an EC2 instance and SFTP data to Hadoop Data Lake
  • Created pig script to parse JSON data and added custom UDF to handle conversion of some hexadecimal fields to ASCII
  • Developed MapReduce programs to cleanse the data in HDFS to make it suitable for ingestion into Hive schema for analysis and also to perform business specific transformations, like conversion of data fields, validation of data, and other business logic
  • Loaded the final parsed and transformed data into Hive structures creating Avro files partitioned on load date time stamp
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Implemented a view based strategy to segregate sensitive data and assign different access roles to different views
  • Created a parallel branch to load the same data to Teradata using sqoop utilities
  • Created a Tableau report on the Teradata solution to provide business with their day to day audit reporting needs
  • Worked with the System testers to manually query the data and do a source to target comparison to ensure data integrity
  • Worked with the business team to get them access to Hive and the Tableau reports along with UAT for the project

Environment: Hadoop, Hive, Pig, MongoDB, UNIX, MapReduce, CDH, SQL, Teradata, Tableau, AWS EC2

Confidential, TX

Hadoop/ Java developer

Responsibilities:

  • All the fact and dimension tables were imported from sql server into Hadoop using Sqoop.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Involved in extracting customer’s Big data from various data sources into Hadoop HDFS. This included data from mainframes, databases and also logs data from servers.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Developed python UDFs in Pig and Hive
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Installed and configured various components of Hadoop ecosystem and maintained their integrity.
  • Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Implemented automatic failover Zookeeper and zookeeper failover controller.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Developed Java map-reduce programs for to encapsulate transformations.
  • Developed Oozie workflows and sub workflows to orchestrate the Sqoop scripts, pig scripts, hive queries and the Oozie workflows are scheduled through Autosys.
  • Worked with DevOps team in Hadoop cluster planning and installation.
  • Using the data Integration tool Pentaho for designing ETL jobs in the process of building Data warehouses and Data Marts.
  • Participation in Performance tuning in database side, transformations, and jobs level.

Environment: Hadoop, HDFS, Map Reduce, Sqoop, Hive, Pig, Oozie, NDM, Cassandra, SVN, CDH4, Cloudera Manager, MySQL, PDI/Kettle (Pentaho), Eclipse

Confidential, OK

Java/J2EE Developer

Responsibilities:

  • Involved in various Software Development Life Cycle (SDLC) phases of the project like Requirement gathering, development, enhancements using agile methodologies.
  • Developed the user interface using Spring MVC, JSP, JSTL, Javascript. Custom Tags, Jquery, Html and CSS.
  • Used Spring MVC for implementing the Web layer of the application. This includes developing Controllers, Views and Validators.
  • Developed the service and domain layer using Spring Framework modules like Core-IOC, AOP.
  • Developed the Application Framework using Java, Spring, Hibernate and Log4J.
  • Created DB tables, functions, Joins and wrote prepared statements using SQL.
  • Configured Hibernate session factory in applicationcontext.xml to integrate Hibernate with Spring.
  • Configured ApplicationContext.xml in SPRING to adopt communication between Operations and their corresponding handlers.
  • Developed Spring rest controllers to handle json data and wrote dao’s and services to handle the data.
  • Created DB tables, functions, Joins and wrote prepared statements using PL/SQL.
  • Consumed and Create REST Web services using Spring and Apache CXF.
  • For integration framework Apache Camel is used.
  • Developed MySQL stored procedures and triggers using SQL in order to calculate and update the tables to implement business logic.
  • Used Maven to build the application and deployed on JBoss Application Server.
  • Used intellij for development and JBoss Application Server for deploying the web application.
  • Monitored the error logs using log4j.
  • Implemented Spring JMS message listeners with JMS queues for consumption of Asynchronous requests.
  • Used AOP concepts like aspect, join point, advice, point cut, target object and also AOP proxies.

Environment: Jdk 1.6, HTML, Jsp, Spring, JBoss, log 4j, Tortoise SVN, Hibernate, SOAP web services, maven, SOAP UI, Eclipse Kepler, java script, Xml, Mysql v5

Confidential

Java Developer

Responsibilities:

  • Involved in interaction with the client for designing of Function Specification document.
  • Developed Use Cases, Class Diagrams, Activity Diagrams and Sequence Diagrams
  • Developed JSPs for the front end and Servlets for handling Http requests.
  • Developed GUI using JSP, HTML and JavaScript.
  • Developed a search engine in web page with HTML and JavaScript to get the funds details of the organization.
  • Developed an API to write XML documents from a database
  • Used JDBC to interact with MySQL database.
  • Wrote SQL code within Java classes.
  • Involved in creating the SQL and PL/SQL queries and procedures.
  • Used Java collections API extensively for fetching the data from the result sets
  • Performed usability testing for the application using Junit Test.
  • Testing the application and troubleshoot and fixed bugs.
  • Developed Servlets and back-end Java classes using Apache Tomcat application server
  • Prepared User manual, deployment guide for doing required configuration.

Environment: Java 1.4, J2EE 4,JSP 2.0, Servlets 2.4, JavaScript, JDBC 3.0, MySQL, JUnit, Eclipse 3.2, HTML,XML, MS Access, Apache Tomcat 5.5, Eclipse

Hire Now