Big Data Developer/Senior Analyst Resume

SUMMARY:

8+ years of Professional Experience in Implementation and Application Support projects on Hadoop, Java, IBM Data Stage and TERADATA.
5 years of strong experience in Big Data technologies using Cloudera Apache Hadoop (CDH 4 and CDH 5) and its ecosystem like HDFS, MapReduce (MRV1,MRV2/YARN), Apache PIG, Apache Spark with Scala, Apache HBase, Apache Hive, Apache Sqoop, Apache Zookeeper, Apache Flume, Apache Oozie, Apache Cassandra, Cloudera Hue
In depth understanding / knowledge of the Hadoop Architecture and its various components such as HDFS, Resource Manager, Node Manager, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node and MapReduce concepts.
Experienced in importing and exporting data from relational database into / from HDFS using Sqoop
Extensively worked on creating complex MapReduce (MR) Batch programs to perform Big Data processing and analysis using Pig Latin and customized core JAVA UDF’s.
Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT e.t.c. to perform complex Big Data processing and analysis on HDFS
Experience in implementing partitioning and bucketing techniques in HIVE.
Experience in writing HiveQL queries to store processed data into Hive tables for Big Data oriented analysis.
Developed projects in Apache Spark using Scala with in memory processing features.
Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Experienced in NoSQL Column - Oriented Databases like HBase and its Integration with HDFS.
Worked on a cluster with size of 600+ nodes and storage of 4+ Peta Bytes.
Experienced in loading log files from multiple sources directly into HDFS using Flume
Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper
Experienced in migrating ETL projects on to the Hadoop Platform
Over 3 years of Core JAVA experience developing various customized JAVA UDF’s
Good knowledge in installing, configuring and monitoring HDFS clusters in cloud AWS.
4 years of Data Warehousing experience by designing and developing multiple complex Extract-Transform-Load (ETL) processes using IBM Info sphere DataStage 8.1
Extensively used DataStage Designer, DataStage Manager, DataStage Administrator and DataStage Director for designing, developing and testing complex and multifaceted ETL scripts
Expertise knowledge in developing Server jobs, Parallel jobs and Shared Containers using best practices for ETL projects, Standard Naming Conventions and Code Management
Extensively used ETL methodology for Data Warehousing by performing Data Analysis, Data Extraction, Data Cleansing, Data Validation, Data Transformations, Data Integration and Data Loading process in corporate-wide ETL solutions
Excellent knowledge of data mapping, extract, transform and load from different data sources
Expertise knowledge in various processing stages such as Aggregator, Funnel, Filter, Merge, Copy, Sort, Change, Capture, Lookup, Join, Pivot, Switch, Remove Duplicate, Transformer e.t.c.
4+ years of experience in with proficiency as Teradata developer and strong expertise in SQL queries, stored procedures, Teradata Macros.
Teradata Certified Professional.
Extensive experience on Teradata utilities such as BTEQ, Multi-load, Fast load and Fast Export.
Passionate towards working in Big Data and Analytics environment
Analytical thinker that consistently resolves on-going issues or defects, often called upon to consult on problems as well a fast learner.
An individual with excellent interpersonal and communication skills, strong business acumen and work ethics, technical competency, team-player spirit and leadership skills
Highly motivated, creative problem solving skills, self-starter with a positive attitude and willingness to learn new concepts and accepts challenges
Experienced in Team management and Project management

TECHNICAL SKILLS:

Big Data: Cloudera Apache Hadoop (CDH 4 and CDH 5)

Relational Database: Teradata, SQL, IBM DB2

NoSQL Database: Apache HBase, Apache Cassandra

Scripting Languages: UNIX Shell Scripts, PERL Scripts

Teradata Utilities: Multi-load, Fast load and Fast Export

O.S / O.S applications: Windows and Unix

Workload Automation / Batch Job workflow Scheduling Software: BMC Control-M

Source Control: Tortoise SVN

Methodologies: Waterfall and Agile SCRUM

PROFESSIONAL EXPERIENCE:

Confidential, Chicago

Big Data Developer/Senior Analyst

Technology: Cloudera Apache Hadoop (CDH 5), HDFS, MapReduce (MR), Apache PIG, Apache Sqoop, Apache Zookeeper, Apache Flume, Hive, BMC Control-M, Core JAVA

Relational Database: Teradata, DB2 UDB

NoSQL Database: Apache HBase, Apache Cassandra

Scripting Languages: UNIX Shell Scripts

Source Control: Tortoise SVN

Methodologies: Agile SCRUM

Roles and Responsibilities:

Working on a live Big Data Hadoop production environment with 600 nodes.
Understanding the client requirements and creating formal Business Requirement specifications.
Performed Impact Analysis of existing Legacy systems
Transforming the data according to business logic in HIVE & PIG.
Good experience in writing MapReduce programs in Java on MRv2 / YARN environment.
Created High Level and Low Level Design Specifications
Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT etc. to perform complex Big Data processing and analysis on HDFS
Extracted structured data from Teradata, DB2 UDB and DB2 z/OS Relational Database onto HDFS using Sqoop
Created complex Pig Latin scripts to process the extracted data as per the Business Requirement specifications
Developed Pig Scripts to store unstructured data in HDFS
Created Managed tables and External tables in Hive and loaded data from HDFS
Designed and implemented complex Map Reduce (MR) jobs to support distributed processing using Pig Latin and customized JAVA UDF’s
Extracted and processed structured / semi-structured / unstructured data from Hbase Tables onto HDFS
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how it translates to MapReduce (MR) jobs
Created multiple UNIX Shell Scripts and PERL Scripts
Created HBase Tables to store data onto them using Pig Scripts
Scheduled the various scripts on BMC Control-M
Unit testing and Integration testing
Production Deployment and maintenance

Confidential

ETL Developer

Technology: Cloudera Apache Hadoop (CDH 4), HDFS, MapReduce (MR), Apache PIG, Apache Sqoop, Apache Zookeeper, BMC Control-M, Core JAVA

Relational Database: Teradata, DB2 UDB (LUW)

NoSQL Database: Apache HBase

Scripting Languages: UNIX Shell Scripts

Source Control: Tortoise SVN

Methodologies: Agile SCRUM

Roles and Responsibilities:

Understanding the client requirements and creating formal Business Requirement specifications
Performed Impact Analysis of existing Legacy systems
Created High Level and Low Level Design Specifications
Extracted structured data from Teradata, DB2 UDB and DB2 z/OS Relational Database onto HDFS using Sqoop
Extracted and processed structured / semi-structured / unstructured data from Hbase Tables onto HDFS
Created complex Pig Latin scripts in order to process the extracted data as per the Business Requirement specifications
Developed Pig Scripts to store unstructured data in HDFS
Designed and implemented complex Map Reduce (MR) jobs to support distributed processing using Pig Latin and customized JAVA UDF’s
Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT e.t.c. in order to perform complex Big Data processing and analysis on HDFS
Researched, evaluated and utilized new technologies / tools / frameworks around the Hadoop eco-system
Extracted and processed structured / semi-structured / unstructured data from Hbase Tables onto HDFS
Created Hbase Tables and loaded data onto them using Pig Scripts
Scheduled the various scripts on BMC Control-M
Unit testing and Integration testing
Production Deployment and maintenance

Confidential

Software Developer

Technology: IBM Datastage 8.1,Teradata

Relational Database: Teradata, SQL, IBM DB2

NoSQL Database: Apache HBase

Scripting Languages: UNIX Shell Scripts and PERL Scripts

Source Control: Tortoise SVN

Methodologies: Waterfall

Roles and Responsibilities:

Involved in understanding the requirement from business and design an implementation plan.
Design the technical ETL solution that meets the requirements from different entities involved in the model.
Building the Data Stage Jobs to pull the data applying the business transformations using various processing stages like Join, Look up, Sort, Copy, Transformer, Funnel, CFF, DB stages.
Identifying the overall Jobs/Sequence dependencies and creating the Control-M jobs to schedule the code to run in Production Creating Job Sequences and job parameters for scheduling.
Analysis of the possibilities of solution enhancement.
Responsible for Performance tuning of SQL Queries in case of spool issues and to reduce the time taken for generating extracts.
Analysis of Change Request and impact of it on other modules.
Responsible for the technical delivery of code for the module assigned to me and my team.
Provide technical/functional help to the team.
On-boarding new resource in the team by providing technical/function KTs.
Develop Bteq Scripts/Teradata SQL queries.
Responsible for creating various documents involved in the implementation process.
Review the code to make best use of Teradata coding standards.
Post production implementation support responsibility.
Working on daily production issues and resolving them with in the stipulated time frame.

Confidential

Software Developer

Technology: JAVA, J2EE, JSP,STRUTS, HTML, CSS, JAVA SCRIPT, Tomcat, Servlets, JDBC.

Relational Database: Oracle, SQL, DB2 UDB (LUW), DB2 z/OS (Mainframe)

Scripting Languages: UNIX Shell Scripts and PERL Scripts

Source Control: Tortoise SVN

Methodologies: Waterfall

Roles and Responsibilities:

Responsible for understanding the requirement from Business users
Participated discussions with the teammates and in designing the system
Developed web pages using JSP and handled the requests using java and Servlets.
Developed client side validations using java script.
Validation done Server side on basis of file format support.
Developed java code according to MVC architecture
Used Apache Log4j for logging and Error handling
Bug fixing for priority one issues.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship