Hadoop Lead/architect Resume
Addison, TX
PROFESSIONAL SUMMARY:
- Around 10 years of experience in analysis, design and development of software applications using various technologies.
- 4+ years of strong experience with Big Data and Hadoop Ecosystems.
- Hands on experience in Apache Hadoop ecosystem components like HDFS, MapReduce, Oozie, Zookeeper, Hive, Sqoop, HBase, Flume, Pig, Spark, Kafka, Scala, Hue and Impala
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice - versa.
- Experience in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA.
- Extending HIVE and PIG core functionality by using custom UDF's.
- Experience in NoSQL databases such as HBase and Cassandra.
- Experience in coding MapReduce programs, knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Developed PIG Latin scripts for handling business transformations.
- Experience in using Flume and Kafka to load the log data from multiple sources into HDFS.
- Hands on experience in virtualization and worked on VMware Virtual Center.
- Having good knowledge on Python and R.
- Extensive experience in Requirements gathering, Analysis, Design, Reviews, Coding and Code Reviews, Unit and Integration Testing.
- Adequate knowledge and working experience with Agile methodology.
- Having Good knowledge on Single node and Multi Node Cluster Configurations.
- In depth understanding of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce.
- Experience in setting up Hive, Pig, HBase and Sqoop on Ubuntu Operating system.
- Proficiency in OOProgramming using Java technologies, web technologies like HTML, XML, JSP & JavaScript.
- Good experience and knowledge on SQL queries for manipulating data.
- Good experience in developing Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files
- Having Experience on UNIX commands and Deployment of Applications in Server.
- Experienced in interacting with business users and technical consultants to analysis the requirements, business process, transforming requirements into technical specifications, designing databases, documenting, rolling out the deliverables.
- Having good experience developing Java and mainframes applications.
- Design, Development and testing of applications in Mainframes applications.
- Effective ability to work independently as well as a team member on group Projects.
PROFESSIONAL EXPERIENCE:
Confidential, Addison, TX
Hadoop Lead/Architect
Responsibilities:
- Developed Managed, External and partition tables as per the requirement.
- Experience in loading and transforming of large sets of structured, semi structured and unstructured data.
- Ingested structured data into appropriate schemas and tables to support the rule and analytics.
- Developed custom User Defined Function (UDF's) in Hive to transform the large volumes of data with respect to business requirement.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from edge node to HDFS using shell scripting.
- Implemented scripts for loading data from UNIX file system to HDFS.
- Implemented a script to transmit sysprin information from Oracle to Hbase using Sqoop.
- Load and transform large sets of structured, semi structured and unstructured data.
- Automated workflow using Shell Scripts.
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive series like REGEX, JSON and Avro.
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
- Used Kafka for messaging services instead of message broker.
- Experience in Hadoop 2.x with spark and Scala.
- Managed Hadoop jobs using Oozie workflow scheduler system for Map Reduce, Hive, Pig and Sqoop actions.
- Good knowledge on Data Ingestion and Data Processing.
- Sound knowledge on Python and R.
- Experience in managing and reviewing Hadoop log files.
- Used Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Responsible to manage the test data coming from different sources.
- Responsible for developing batch process using Unix Shell Scripting.
Environment: Apache Hadoop, HDFS, Hive, Pig, Sqoop, HBase, Unix, Shell Scripting, Spark, Scala, Kafka, Oozie, Zookeeper, CDH5.
Confidential, Somerset, NJ
Hadoop Developer / Analyst
Responsibilities:
- Setup scripts to fetch data from various ftp server locations and copy them into HDFS folder corresponding to the client.
- Defined client - agnostic formats for different kinds of data we receive from the clients.
- Wrote Pig UDFs to pre-process the data received from various clients, and transform them to the required formats.
- Specified numerous Pig relations to map various fields in the data set.
- Developed various Pig Latin scripts to join, group different kinds of data to construct relevant records according to the functional requirement.
- Developed MapReduce programs for analyzing the data, in cases where Pig scripts performance is not satisfactory.
- Utilized HCATALOG to access Hive tables metadata from Pig scripts and MapReduce jobs.
- Implemented test scripts to support test driven development and continuous integration.
- Automated the jobs to pull the data from ftp servers to HDFS using Oozie workflows and enabled email alerts for communication in case of any failure.
- Performed unit testing of MapReduce jobs using MRUnit.
- Worked closely with the Data Analyst to identify the business aspects for analysis.
- Took part in managing and reviewing log files.
- Involved in set up of Oracle R connector for Hadoop so that data analyst can use data in HDFS to perform analytics.
- Actively took part in scrum meetings to discuss the progress of the deliverables.
Environment: CDH4, HDFS, Cloudera Manager, MapReduce, Linux, Putty, Pig, Hive, Oozie, MRUnit, Shell scripting, Eclipse Luna, Java, VersionOne.
Confidential, MI
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Involved in requirement gathering, architecture development, design, development and deployment of solutions built on the Hadoop platform.
- Involved in loading and transforming large sets of Structured, Semi - Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Importing of data from various sources, performing transformations using Pig and loaded data into HDFS and extracted data from Teradata to HDFS using Sqoop.
- Used different file formats like Text files, Sequence Files, Avro.
- Developed map reduce programs for applying business rules on the data.
- Played a key role in mentoring the team on developing MR jobs and custom UDFs.
- Creating Hive tables and working on them using Hive QL.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Developed Scripts to schedule the batch jobs.
- Helped the team in optimizing Hive queries.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Weekly meetings with technical collaborators and active participation in code review sessions with junior developers.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Linux, XML, MySQL, HBase, Ubuntu.
Confidential, Webster, NY
Hadoop developer
Responsibilities:
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozieon the Hadoop cluster.
- InstalledOozie workflow engine to run multiple Hive and Pig Jobs.
- Setup and benchmarked Hadoop /HBase clusters for internal use.
- Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
- Developed Simple to complex Map/reduce Jobs using Hive and Pig.
- DevelopedMap Reduce Programs for data analysis and data cleaning.
- DevelopedPIG Latin scripts for the analysis of semi structured data.
- Developed and involved in the industry specific UDF (user defined functions)
- UsedHive and created Hive tables and involved in data loading and writing Hive UDFs.
- UsedSqoop to import data into HDFS and Hive from other data systems.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- DevelopedHive queries to process the data for visualizing.
Environment: Apache Hadoop, HDFS, Cloudera Manager, Java, MapReduce, Eclipse Indigo, Hive, PIG, Sqoop, Oozie and SQL.
Confidential
JAVA/J2EE Consultant
Responsibilities:
- Development using Struts MVC model with J2EE standards.
- Design and development of front end using JSPs, struts, XML, JavaScript, HTML.
- Design and development of Action & Form objects as part of Struts frame work.
- Involved in the Development and Deployment of Stateless Session beans.
- Generated deployment descriptors for EJBs using XML.
- Worked on JavaScript libraries like JSP, angular JS, and JQuery to develop the application.
- Developed shell scripts for Inventory Management.
- Assisted in troubleshooting JSP and Java code (EJBs and Servlets).
- Ported Application in WebSphere.
Environment: JDK 1.4, IBM WebLogic 7.1, WSAD 5.0, Oracle 9i, Ant, CVS, JUnit, Struts 2.0, JavaScript 1.1, HTML, Log4j, Rational Rose, Unix.
Confidential
Junior Programmer
Responsibilities:
- Requirement gathering and worked according to the CR.
- Data validation/Reconciliation report generation.
- Code Development as per the client requirements.
- Involved in the development backend code, altered tables to add new columns, Constraints, Sequences and Indexes as per business requirements.
- Performed DML, DDL Operations as per the Business requirement.
- Creating views and prepared the Business Reports.
- Resolved production issues by modifying backend code as and when required.
- Used different joins, sub queries and nested query in SQL query.
- Involved in creation of sequences for automatic generation of Product ID.
- Created Database Objects like tables, Views, sequences, Synonyms, Stored Procedures, functions, Packages, Cursors, Ref Cursor and Triggers.
- Testing of code functionality using testing environment.
- Worked under the senior level guidance.
Environment: MySQL, Windows, MS Excel, Reports, Java.
