Lead Hadoop Analyst Resume
Richmond, VA
PROFESSIONAL SUMMARY:
- Over 7 years of progressive experience in analysis, design, development and testing on Big Data and Core Java /J2EE and Scala applications.
- 3+ years of proficient experience in Big Data applications using Hadoop.
- In - depth understanding of Hadoop architecture and various components such as HDFS and MapReduce programming paradigm
- Extensive knowledge in Hadoop components like Map Reduce, Spark, Hive, Impala, Sqoop, Hbase, Oozie and HUE
- Excellent understanding and knowledge of NOSQL database- HBase
- Good experience in scheduling the jobs using Oozie
- Experience in using different file formats (text, parquet, json, xml)
- Worked in different version controlling tools like Git, Bitbucket, SVN, Clear Case
- Responsible to manage data coming from various sources.
- Involved in loading data from UNIX file system to HDFS
- Developed scripts to schedule various Hadoop jobs
- Experience in Cloudera Distribution Hadoop cluster
- Have sound exposure to Health Care / Retail domains and ETL / licensing framework with strong analytical, design skills and problem solving abilities as an added advantage.
- Worked in Agile/Scrum and waterfall based delivery environment
- Experience in managing and reviewing Hadoop log files.
- Created MRUnit and JUnit test cases to test map reduce and java applications
- Good hands on experience in Apache Tomcat, JBOSS, Web Sphere, Data Junction, FileZilla-FTP Client, WinSCP, Putty, SQuirreL SQL Client, Microsoft SQL Server and IBM DB2
- Excellent communication and presentation skills.
- Basic knowledge in Python, Scala, StreamSet and AWS
- Worked as Onsite and Offshore team lead
TECHNICAL SKILLS:
Big Data tool: Hadoop, Map reduce, Spark, Hive, Impala, Hbase, Oozie, Sqoop, Hue
Languages: Core Java/J2EE, UNIX scripting, SQL, PL/SQL
Databases: Microsoft SQL Server, IBM DB2
Frameworks: MVC, Spring MVC, ETL, Pega PRPC Certification and Licensing Framework (CLF)
BPM Tools: PEGA PRPC 7.1, PEGA PRPC 6.3
Development Tools: IBM Rational Application developer, Rational Software Architect, Eclipse
Servers: Apache Tomcat, JBOSS, Web sphere
Build Tool: Maven, Ant
Version management tools: Git, IBM Rational Clear case, SVN, Bitbucket
Tools: Data Junction, FileZilla-FTP Client, WinSCP, Putty, SQuirreL SQL Client, Control-M
Software Development:
Methodology: Agile/Scrum, Waterfall
Unit Testing Frameworks: MRUnit, JUnit
PROFESSIONAL EXPERIENCE:
Confidential, Richmond/VA
Lead Hadoop Analyst
Responsibilities:
- Responsible for data loading from data processing
- Leading onsite team in an Agile development model
- Developed Spark program to process and transform data
- Created data frames and joined multiple data frames for processing data and written the results to HDFS
- Developed bash scripts to create DDLs,and import data from Teradata
- Involved in automation using Control-M
- Scheduling and monitoring cyclic and adhoc jobs through Control-M
- Involved in loading data from Unix file system to HDFS
- Experience in creating external and internal Hive tables.
- Involved in analyzing and validating data using Hive, Impala and HUE
- Handled Parquet file formats
- Managed and reviewed Hadoop/Spark log files
- Developed bash scripts to call Spark jobs
- Developed Impala/Hive queries to fetch data from Data Lake.
- Built and deployed application using Maven
- Involved in Agile/Scrum methodology
- Used Git/Bitbucket for version control
- Used Bamboo for CICD
Environment: Java 1.8, Eclipse, Scala, HDFS, Spark, Hive, Impala, Maven, HUE, Control-M, UNIX Script, Putty, FileZilla, Tera data SQL assistant, Bamboo, Jira
Confidential
Lead Hadoop Developer
Responsibilities:
- Responsible for ETL (Extract Transform and Load) data to Data Lake
- Lead offshore team an Agile development cycle
- Developed Map reduce program to clean and transform data
- Developed Spark jobs for cleaning and loading data to HDFS for better performance
- Developed bash scripts to load data
- Involved in automation using automation console and JSON
- Involved in loading data from Unix file system to HDFS
- Experience in creating external and internal Hive tables with dynamic partitions on Data Lake.
- Created external Hive stage tables on top of loaded data in HDFS as part of ETL
- Finally loaded data from stage tables to final tables
- Developed map reduce programs with map side and reducer side joins
- Used Distributed cache for better performance
- Involved in analyzing and validating data using Hive, Impala and HUE
- Involved in optimization of Hive/Impala queries
- Handled Text and Parquet file formats
- Managed and reviewed Hadoop log files
- Developed bash scripts to call impala/hive queries and to load results to middleware TIBCO using SCP
- I was the only one responsible for fetching data from a common Data Lake based on the various user request through UI
- Developed Impala/Hive queries to fetch data from Data Lake.
- Built and deployed application using Maven
- Involved in Agile/Scrum methodology
- Used Git/Bitbucket for version control
- Supported AWS migration
Environment: Java 1.7, Eclipse, Map reduce, HDFS, Spark, Hive, Impala, Maven, HUE, UNIX Script, Putty, FileZilla
Confidential, New York
Lead Hadoop Developer
Responsibilities:
- Developed various map only jobs and map reduce jobs for various rules.
- Created sequence of map reduce jobs. Subsequent jobs fetch the output of the previous job as the input.
- Imported and exported data from SQL Database to HDFS using Sqoop through regular intervals
- Lead offshore team an Agile development process
- Developed Hive and Impala queries for the application
- Code optimization and performance enhancements
- Involved in analysis, design, enhancement and maintenance of the application.
- Involved in dynamic Oozie workflow generation and executed multiple jobs parallel by fork and join
- Prepared MRUnit test cases and involved in unit testing of the application
- Created Custom partitions for better performance
- Used Distributed cache for hive/impala query results
- Involved in creating REST web service using JSON
- Managed and reviewed Hadoop log files
- Built and deployed application using Maven
- Involved in Agile/Scrum methodology
- Used Git/Bitbucket for version control
Environment: Hive, Impala, Sqoop, Map reduce, Oozie, Java 1.7, Maven, FileZilla, Putty
Confidential
Java/Hadoop/Pega Developer
Responsibilities:
- Responsible for ETL data
- Developed various map reduce jobs for loading data.
- Created HBase tables.
- Inserted data to HBase table
- Fetched data from HBase tables
- Prepared MRUnit test cases and involved in unit testing of the application
- Managed and reviewed Hadoop log files
- Created and used static Oozie workflow
- Built and deployed application using Maven
- Used svn for version control
Environment: Map reduce, HBase, Oozie, Java 1.7, Maven, FileZilla, Putty
Confidential, Sacramento/CA
Pega System Architect
Responsibilities:
- Participated in all phases for PERL - Pega Enterprise Licensing
- Responsible to enable and configure all modules and business scenarios and use cases for application of online licenses and certification across Public Health.
- Build out business rules, configured routing, designed and implemented UIs and harnesses to support the business requirements.
- Participated in unit and all phases of testing to deploy within agreed timelines
- Involved in PERL Detailed Resource Task plan creation
- Preparation of Perl application profile document.
- Prepared test strategy for Perl.
- Actively involved in the creation of the below documents for Perl
- Kick off Presentation
- High level development solution document
Environment: PRPC 7.1, PostgreSQL
Confidential, Indianapolis/IN
Java Developer
Responsibilities:
- Involved in planning, estimation, task allocation, technical support and team management.
- Prepared necessary documents like Estimation, schedule and design.
- Prepared test plans and involved in testing of the application.
- Constructed Java code for the enhancements of the application
- Developing PL/SQL queries and stored procedures for the application
- Analyzed requirements directly from the client.
- Writing/Modifying DB2 Procedures for Database manipulations.
- Involved in Unit testing, System Testing, and Integration Testing.
- Used Log4j framework to log application
- Data Junction is used to translate data using XML mapping
- Prepared JUnit test cases and involved in unit testing of the application
- Designed application using MVC architecture
- Involved in analysis, design, enhancement and maintenance of the application.
- Preparing WPSR, Metrics and weekly task trackers.
- Code optimization and performance enhancements
- Knowledge Transfer
- Built and deployed application using Ant
- Involved in Waterfall methodology
- Used SVN for version control
Environment: Java 1.5, Ant, FileZilla, Putty, DB2, Data junction, JBoss, PL/SQL, JSP, JS, HTML, SQL,WAS