Hadoop Engineer Resume
Round Rock, TexaS
SUMMARY
- Over 11 years of IT experience in analyzing, designing, development, and maintenance of critical web based business applications and applications involving with Data warehousing methodologies.
- Over 4 years of extensive experience in Big Data Ecosystem.
- Good exposure to consolidate, validate and cleanse data from a vast range of sources - from applications and databases to files and Web services.
- Good exposure in building RESTful APIs in front of different types of NoSQL storage engines.
- Good Experience in Developing Applications using Java, J2EE (Servlets, JSP, struts, spring, JMS).
- Developed web-based applications using Java, J2EE, Web Services, both SOAP WSDL and REST, MVC framework,spring,HibernateStruts, Oracle and Sql.
- Good knowledge of Hadoop architecture and various Hadoop Stack elements.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
- Cutting edge experience onSplunk(Log based performance monitoring tool).
- Exposure on Spark, Solr, Kafka and Scala Programming.
- Extensively used ETL methodology for supporting Data Extraction, transformations and load.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java and Python.
- Experience in working with as well as Hosting Hadoop in cloud environments such as Amazon AWS/EC2 & EMR.
- Worked on developing ETL processes (Data Stage & Talend open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.
- Experience in developing pipelines and processing data from various sources and processing them with Hive and Pig.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experienced in building sophisticated distributed systems using REST/hypermedia web APIs (SOA).
- Good Exposure in providing solutions using SOA, Distributed Computing & Enterprise Service Bus.
- Experience on leasing, financing, Telecom, retail and Health care domains by virtue of work done on different systems.
TECHNICAL SKILLS
Programming Languages: Core Java, J2EE, Scala, XML, DB2, CICS, SQL, PL/SQL, HiveQL, Pig Latin
Hadoop Eco System: HDFS YARN, MapReduce, Pig, Hive, Sqoop, Flume and Zookeeper
Hadoop Distributions: Cloudera, IBM BigInsights
Operating Systems: Linux, Unix, MVS, Windows
Non-Relational Databases: MongoDB, Cassandra
Relational Databases: DB2 V 9.0, MySQL, Microsoft SQL Server
Scripting Languages: Python, Shell Scripting
Application/Web Servers: Apache Tomcat, JBoss, Websphere, MQ Series, Data power, Web services
Tools: Endeavor, Data Power XI150 Appliance, SoapUI, Jmeter, XML Harness, Labs testing tool
QA Tools: Quality Center
IDE: Eclipse
Versioning Tools: GIT, SVN, Librarian
Processes: IBM’s QMS (Quality Management System), OPAL Hadoop Core Competencies
PROFESSIONAL EXPERIENCE
Confidential - Round Rock, Texas
Hadoop Engineer
Responsibilities:
- Understanding Client’s DW Application, Interfaces and Business involved.
- Involved in creating Data Lake by extracting customer’s Big Data from various data sources into Hadoop HDFS. This included data from Excel, Web sac, databases and also log data from servers.
- Worked on data load from various sources i.e., Oracle, MySQL, DB2, MS SQL Server, Cassandra, MongoDB, Hadoop using Sqoop and Python script.
- Responsible for production Hadoop-cluster set up, administration, maintenance, monitoring and support.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Back-end Java developer for Data Management Platform (DMP) and building RESTful APIs in front of different types of NoSQL storage engines allowing other groups to quickly meet their Big Data needs.
- Work closely with architect and clients to define and prioritize their use cases and iteratively develop APIs and architecture.
- Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Worked with Business Developer team in generating customized reports and ETL workflows in Data Stage.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
Environment: CDH Version 4.5 used for the project which includes Apache Hadoop 2.0, Hive 0.10,Hue 2.1, Pig 0.10, Sqoop 1.4, Oozie 3.2.0, Cassandra, Map Reduce, HDFS, Hbase, Splunk, Storm, Kafka.
Confidential - Temple Terrace, FL
Big Data/Hadoop Engineer
Responsibilities:
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL (Cassandra) and a variety of portfolios.
- Loaded the customer profiles data, customer usage information, billing information etc. onto HDFS using Sqoop and Flume.
- Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured and unstructured data.
- Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server.
- Created data models in CQL for customer data.
- Used the machine learning libraries of Mahout to perform advanced statistical procedures like clustering and classification to determine the usage trends.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Administrator for Pig, Hive and Hbase.
- Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
Environment: CDH4, Flume, Hive, Sqoop, Pig, Oozie, Cassandra, JDK1.6, Map reduce, HDFS, Hbase, Storm.
Confidential, Franklin Lakes, NJ
Hadoop Developer / Data Power Developer
Responsibilities:
- Worked on Hadoop cluster set up, administration, maintenance, monitoring and support.
- Analyzing requirements for Optimization and tuned the Hadoop environment to meet the business requirements.
- Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
- Loaded the customer profiles data, customer claims information, billing information etc. onto HDFS using Sqoop and Flume.
- Gathered requirements from Engineering (Statistical analysis) and Finance (Financial Reporting) teams to design solutions on the Hadoop ecosystem.
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- Used Oozie to orchestrate the map reduce jobs and Development of Pig scripts for handling analysis.
- Used Pattern matching algorithms to recognize the abnormal patterns across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
Environment: Web services, Data Power, Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, LINUX, and Big Data, Mainframe, DB2, CICS.
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC). Used agile methodology and participated in Scrum meetings.
- Developed the application using Spring Framework that leverages Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams were used.
- Data from UI layer sent throughJMS to Middle layer and from there using MDB message retrieves Messages and will be sent to MQSeries.
- Used JSON as response type in REST services.
- Used RESTFUL client to interact with the services by providing the RESTFUL URL mapping.
- Performed unit testing using JUnit.
Confidential
Sql Developer
Responsibilities:
- Creating the database and multiple tables having many-to-many relationship and performing Data Manipulation Language (DML) like insert, update and delete operation on the data using Eclipse.
- Developed custom web application for reporting service entitlements related information with Java frontend and MySQL as backend.
- Developed MySQL queries to process information in from various tables using multi table joins.
- Involved in test cases preparation, mapping them with the requirements, test execution, validation and documentation.