Big Data Developer/senior Analyst Resume
SUMMARY:
- 8+ years of Professional Experience in Implementation and Application Support projects on Hadoop, Java, IBM Data Stage and TERADATA.
- 5 years of strong experience in Big Data technologies using Cloudera Apache Hadoop (CDH 4 and CDH 5) and its ecosystem like HDFS, MapReduce (MRV1,MRV2/YARN), Apache PIG, Apache Spark with Scala, Apache HBase, Apache Hive, Apache Sqoop, Apache Zookeeper, Apache Flume, Apache Oozie, Apache Cassandra, Cloudera Hue
- In depth understanding / knowledge of the Hadoop Architecture and its various components such as HDFS, Resource Manager, Node Manager, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node and MapReduce concepts.
- Experienced in importing and exporting data from relational database into / from HDFS using Sqoop
- Extensively worked on creating complex MapReduce (MR) Batch programs to perform Big Data processing and analysis using Pig Latin and customized core JAVA UDF’s.
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT e.t.c. to perform complex Big Data processing and analysis on HDFS
- Experience in implementing partitioning and bucketing techniques in HIVE.
- Experience in writing HiveQL queries to store processed data into Hive tables for Big Data oriented analysis.
- Developed projects in Apache Spark using Scala with in memory processing features.
- Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Experienced in NoSQL Column - Oriented Databases like HBase and its Integration with HDFS.
- Worked on a cluster with size of 600+ nodes and storage of 4+ Peta Bytes.
- Experienced in loading log files from multiple sources directly into HDFS using Flume
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper
- Experienced in migrating ETL projects on to the Hadoop Platform
- Over 3 years of Core JAVA experience developing various customized JAVA UDF’s
- Good knowledge in installing, configuring and monitoring HDFS clusters in cloud AWS.
- 4 years of Data Warehousing experience by designing and developing multiple complex Extract-Transform-Load (ETL) processes using IBM Info sphere DataStage 8.1
- Extensively used DataStage Designer, DataStage Manager, DataStage Administrator and DataStage Director for designing, developing and testing complex and multifaceted ETL scripts
- Expertise knowledge in developing Server jobs, Parallel jobs and Shared Containers using best practices for ETL projects, Standard Naming Conventions and Code Management
- Extensively used ETL methodology for Data Warehousing by performing Data Analysis, Data Extraction, Data Cleansing, Data Validation, Data Transformations, Data Integration and Data Loading process in corporate-wide ETL solutions
- Excellent knowledge of data mapping, extract, transform and load from different data sources
- Expertise knowledge in various processing stages such as Aggregator, Funnel, Filter, Merge, Copy, Sort, Change, Capture, Lookup, Join, Pivot, Switch, Remove Duplicate, Transformer e.t.c.
- 4+ years of experience in with proficiency as Teradata developer and strong expertise in SQL queries, stored procedures, Teradata Macros.
- Teradata Certified Professional.
- Extensive experience on Teradata utilities such as BTEQ, Multi-load, Fast load and Fast Export.
- Passionate towards working in Big Data and Analytics environment
- Analytical thinker that consistently resolves on-going issues or defects, often called upon to consult on problems as well a fast learner.
- An individual with excellent interpersonal and communication skills, strong business acumen and work ethics, technical competency, team-player spirit and leadership skills
- Highly motivated, creative problem solving skills, self-starter with a positive attitude and willingness to learn new concepts and accepts challenges
- Experienced in Team management and Project management
TECHNICAL SKILLS:
Big Data: Cloudera Apache Hadoop (CDH 4 and CDH 5)
Relational Database: Teradata, SQL, IBM DB2
NoSQL Database: Apache HBase, Apache Cassandra
Scripting Languages: UNIX Shell Scripts, PERL Scripts
Teradata Utilities: Multi-load, Fast load and Fast Export
O.S / O.S applications: Windows and Unix
Workload Automation / Batch Job workflow Scheduling Software: BMC Control-M
Source Control: Tortoise SVN
Methodologies: Waterfall and Agile SCRUM
PROFESSIONAL EXPERIENCE:
Confidential, Chicago
Big Data Developer/Senior Analyst
Technology: Cloudera Apache Hadoop (CDH 5), HDFS, MapReduce (MR), Apache PIG, Apache Sqoop, Apache Zookeeper, Apache Flume, Hive, BMC Control-M, Core JAVA
Relational Database: Teradata, DB2 UDB
NoSQL Database: Apache HBase, Apache Cassandra
Scripting Languages: UNIX Shell Scripts
Source Control: Tortoise SVN
Methodologies: Agile SCRUM
Roles and Responsibilities:
- Working on a live Big Data Hadoop production environment with 600 nodes.
- Understanding the client requirements and creating formal Business Requirement specifications.
- Performed Impact Analysis of existing Legacy systems
- Transforming the data according to business logic in HIVE & PIG.
- Good experience in writing MapReduce programs in Java on MRv2 / YARN environment.
- Created High Level and Low Level Design Specifications
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT etc. to perform complex Big Data processing and analysis on HDFS
- Extracted structured data from Teradata, DB2 UDB and DB2 z/OS Relational Database onto HDFS using Sqoop
- Created complex Pig Latin scripts to process the extracted data as per the Business Requirement specifications
- Developed Pig Scripts to store unstructured data in HDFS
- Created Managed tables and External tables in Hive and loaded data from HDFS
- Designed and implemented complex Map Reduce (MR) jobs to support distributed processing using Pig Latin and customized JAVA UDF’s
- Extracted and processed structured / semi-structured / unstructured data from Hbase Tables onto HDFS
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how it translates to MapReduce (MR) jobs
- Created multiple UNIX Shell Scripts and PERL Scripts
- Created HBase Tables to store data onto them using Pig Scripts
- Scheduled the various scripts on BMC Control-M
- Unit testing and Integration testing
- Production Deployment and maintenance
ETL Developer
Technology: Cloudera Apache Hadoop (CDH 4), HDFS, MapReduce (MR), Apache PIG, Apache Sqoop, Apache Zookeeper, BMC Control-M, Core JAVA
Relational Database: Teradata, DB2 UDB (LUW)
NoSQL Database: Apache HBase
Scripting Languages: UNIX Shell Scripts
Source Control: Tortoise SVN
Methodologies: Agile SCRUM
Roles and Responsibilities:
- Understanding the client requirements and creating formal Business Requirement specifications
- Performed Impact Analysis of existing Legacy systems
- Created High Level and Low Level Design Specifications
- Extracted structured data from Teradata, DB2 UDB and DB2 z/OS Relational Database onto HDFS using Sqoop
- Extracted and processed structured / semi-structured / unstructured data from Hbase Tables onto HDFS
- Created complex Pig Latin scripts in order to process the extracted data as per the Business Requirement specifications
- Developed Pig Scripts to store unstructured data in HDFS
- Designed and implemented complex Map Reduce (MR) jobs to support distributed processing using Pig Latin and customized JAVA UDF’s
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT e.t.c. in order to perform complex Big Data processing and analysis on HDFS
- Researched, evaluated and utilized new technologies / tools / frameworks around the Hadoop eco-system
- Extracted and processed structured / semi-structured / unstructured data from Hbase Tables onto HDFS
- Created Hbase Tables and loaded data onto them using Pig Scripts
- Scheduled the various scripts on BMC Control-M
- Unit testing and Integration testing
- Production Deployment and maintenance
Software Developer
Technology: IBM Datastage 8.1,Teradata
Relational Database: Teradata, SQL, IBM DB2
NoSQL Database: Apache HBase
Scripting Languages: UNIX Shell Scripts and PERL Scripts
Source Control: Tortoise SVN
Methodologies: Waterfall
Roles and Responsibilities:
- Involved in understanding the requirement from business and design an implementation plan.
- Design the technical ETL solution that meets the requirements from different entities involved in the model.
- Building the Data Stage Jobs to pull the data applying the business transformations using various processing stages like Join, Look up, Sort, Copy, Transformer, Funnel, CFF, DB stages.
- Identifying the overall Jobs/Sequence dependencies and creating the Control-M jobs to schedule the code to run in Production Creating Job Sequences and job parameters for scheduling.
- Analysis of the possibilities of solution enhancement.
- Responsible for Performance tuning of SQL Queries in case of spool issues and to reduce the time taken for generating extracts.
- Analysis of Change Request and impact of it on other modules.
- Responsible for the technical delivery of code for the module assigned to me and my team.
- Provide technical/functional help to the team.
- On-boarding new resource in the team by providing technical/function KTs.
- Develop Bteq Scripts/Teradata SQL queries.
- Responsible for creating various documents involved in the implementation process.
- Review the code to make best use of Teradata coding standards.
- Post production implementation support responsibility.
- Working on daily production issues and resolving them with in the stipulated time frame.
Software Developer
Technology: JAVA, J2EE, JSP,STRUTS, HTML, CSS, JAVA SCRIPT, Tomcat, Servlets, JDBC.
Relational Database: Oracle, SQL, DB2 UDB (LUW), DB2 z/OS (Mainframe)
Scripting Languages: UNIX Shell Scripts and PERL Scripts
Source Control: Tortoise SVN
Methodologies: Waterfall
Roles and Responsibilities:
- Responsible for understanding the requirement from Business users
- Participated discussions with the teammates and in designing the system
- Developed web pages using JSP and handled the requests using java and Servlets.
- Developed client side validations using java script.
- Validation done Server side on basis of file format support.
- Developed java code according to MVC architecture
- Used Apache Log4j for logging and Error handling
- Bug fixing for priority one issues.