Data Engineer / Hadoop Developer / Spark Developer Resume
SUMMARY:
- 14 years of experience in Architecture, Design and development for various Data ware housing and Distributed data processing projects using Hadoop, Spark, Python Informatica 9.1/8.6/7.1/6.2 , Pl/SQL, oracle, web services, DB2, Teradata, Unix shell scripting.
- Around 2 years experience working with spark programming using python api and python modules.
- 3 years of hands on experience with Hadoop framework and its ecosystem components HDFS, MapReduce, Hbase, Sqoop, Mongodb, Hive, PIG, Oozie, python and kafka.
- Experienced in Financial Domain, HCM, Retail and Food Industries.
- Proficient in creating ETL, design/technical documentation from functional specifications.
- Strong SQL query writing skills, and query tuning skills in a VLDB environment.
- Experienced in Analysis, ETL design and development for various Data ware housing projects.
- Expertise in Global integration projects, Scheduling plans and Master data management.
- Expertise in constructing proper Administrative procedure through well - described system operational procedures, ETL Data Flow diagrams and design documents.
- Experience in enhancing existing database design for new enhancements reverse engineering, forward engineering, logical and physical data modeling using ERWIN.
- Proficient in creating high level conceptual data model as part of initial requirement envisioning.
- Experience in Creation of PL/SQL stored procedures, functions, packages, SQL* loader scripts, B-teq scripts, TPT, MLoad, Fload as part of ETL developments.
- Expertise in data movement (data migration) from disparate sources to integrated data warehouses and data marts.
- Proficient in Data cleansing including standardization, de-duplication, matching and Repair.
- Strong Data warehousing conceptual knowledge including star schema dimensional modeling, snowflake schema dimensional modeling, Kimball and inmon’s methodologies.
- Experience in writing UNIX scripts and JCL scripts.
- Extensively worked on performance tuning of the ETL process and SQL optimization.
- Experience in re designing the process in ETL and oracle for improving the performance.
- Experience in development of test process. Created unit and integration test plans.
- Experience with Real time integration with Kafka, messaging and CDC.
- Experience in working with cross-functional teams. Team player and self-starter.
- Experienced of performing the Level 3 support (Developer support for production issues) for 24/7 operations in ETL and involved in performing production support, trouble shooting informatica and database related load issues.
- Experienced in customizing data models and data modeling.
TECHNICAL SKILLS:
Tools: Informatica Power Center 9.1/8.6/7.x/6.x, Power exchange, Power connect, Informatica data Explorer, Business Objects 5/6/Xi, Erwin 4.0,Golden gate 9,Hue,Ambari,Puppet,Foreman,Pigpen
ERP: SAP, People Soft
Database: Oracle 8i/9i/10G, DB2 UDB,SQL Server 2005, Teradata, MS-Access 97/2000 VSAM
OS & Interface: Solaris, NT/2000, VB 6.0, Kafka, MQ series, Web services, XML , MVS, JMS, Mainframe, Z /os
Programming: Python, PIG,C, C++, spark, COBOL, SQL, PL/SQL, SQL * Loader, JCL, Shell, pro*c
Scheduling: Appworx, Tivoli work scheduler (TWS), CTRL M, Unicentre and oozie
PROFESSIONAL EXPERIENCE:
ConfidentialData Engineer / Hadoop Developer / Spark developer
Responsibilities:
- Designed and developed sqoop, linux shell scripts for data ingestion from various data sources of credit Suisse in to HDFS Data lake.
- Created Datasets/ DataFrames from RDDs using reflection and programmatic inference of schema over RDD.
- Developed Python Spark programs for processing HDFS Files using RDDs, Pair RDDs, Spark SQL, Spark Streaming, DataFrames, Accumulators, Broadcast variables.
- Developed pyspark kafka streaming programs to integrate various Credit-suisse source systems to hadoop.
- Developed Pyspark programs using various Transformations and operations.
- Converted existing PIG and Mapreduce jobs to Spark programs.
- Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation
- Data transformations on HIVE and use static, dynamic partitioning and bucketing for performance improvements.
- Established and Followed Spark programming best practices.
- Performance tuning of Pig,HIVE and Spark jobs used caching / persistence, partitioning and Best practices.
- Work with support teams in resolving operational & performance issues
- Research, evaluate and utilize new technologies/tools/frameworks around Hadoop eco system
Environment: Hadoop Ecosystem components: RHEL, Hortonworks HDP 2, Spark, HBase, Hive, Pig, Sqoop, Oozie, Python, Map-reduce, Numpy, JSON, Mongodb, scala
ConfidentialData Engineer / Hadoop Developer
Responsibilities
- Deployed 10 node Hortonworks Hadoop cluster on EC2 nodes in AWS platform.
- Develop Linux, sqoop, Pig scripts and Hive queries as part of POC implementation.--to justify hadoop is good fit for replacement/value added of/to existing platforms and tools.
- Developed python scripts using embedded PIG.
- Developing MapReduce jobs for data preprocessing.
- Developed the Pig,Hive UDF's to preprocess the data for analysis.
- Design and implement map reduce jobs to support distributed processing using Python Hadoop streaming, Hive and Apache PIG.
- Ingestion of data into Hadoop using Sqoop and apply data transformations using Pig scripting.
- Write and test Pig Latin scripts for validating the different query modes using Pigpen.
- Write custom Input format and record reader classes for reading and processing the binary format in map reduce .
- Developed UDFs to provide custom hive and PIG capabilities.
- Writing Custom writable classes for Hadoop serialization and De serialization of Time series tuples.
- Coordinated with data modelers to re-design the existing model for hadoop framework and its ecosystems use sqoop,Pig, Mapreduce,Hive tables, loading with data and writing hive queries which will run internally in map Reduce way.
- Train Analysts, Business users to use Hadoop framework like HDFS commands execution,HDFS files explore, Monitoring Oozie jobs, running Ad-Hoc Hive queries in Hue GUI interface.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Managed and scheduled Oozie Jobs on a Hadoop cluster.
- Involved in log file management where the logs greater than 7 days old were removed from log folder and loaded into HDFS and stored for 3 months.
- Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop using Oraoop connector.
- Set up standards and processes for Hadoop based application design and implementation.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Implemented Oozie schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Document Specifications for mapping & Transformation Rules.
Environment: Hadoop Ecosystem components: RHEL, HDP, HBase, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, Python, Map-reduce
ConfidentialETL Developer
Responsibilities
- Worked in Agile Scrum methodology, Attended Daily Stand ups, status reporting, Requirement grooming and Task estimation.
- Executed a proof of concept on AWS EMR to check the performance metrics between Oracle legacy data warehouse, Hadoop framework and Teradata EDW appliance.s
- Designed and developed KYC (Know your customer) interface Loaders from Ebeaurau, Equifax and RDC.
- Created dimensional data models and ETL for Customer operations.
- Developed B-TEQ scripts, used MLoad Fast load Teradata utilities and developed Unix scripts.
- Performance tuning, Bug fixing and issue resolution, Create Jobs using tidal scheduler.
- Developed Business objects universe, Reports and Alerts.
- Data cleansing using Informatica Data Quality (IDQ)
Environment: AIX, Informatica power center 9.1/ BOXI R2, Data quality, Oracle 11 G, Teradata, Tidal Scheduler, xml, HP quality stage
ConfidentialLead Developer
Responsibilities
- Designed and developed ETL to extract the market data, building hierarchies for security (instruments) master data.
- Designed ETL for Extraction of cash positions, Security positions and Reconciliation.
- Co-ordinate with Transaction services Team, portfolio management team, Asset data team and Fund reporting team in understanding ongoing changes in Source data relationships and Data analysis.
- Converting Requirements into Source to target mapping specifications.
- Development of Real time and batch extraction etls using informatica Power exchange interfaces to websphere MQ, web services and real time CDC from Live databases.
- Developed pl/sql packages, procedures and functions for regulatory reporting and data cleansing.
- Design and development of ETL Frame work for investment Performance reporting and Regulatory Reporting Interfaces.
- Customized data models to accommodate ongoing changes.
- Issue resolution and bug fixing in the tuning enhancements.
- Identification of performance bottlenecks at individual session level.
- Fine tuning Queries, discussing partitioning and indexing strategies with Dba’s.
Environment: Informatica Power Center 8.6.1, Oracle 10G, Flat Files, XML, websphere MQ, web services, Sql server 2005, shell programming, Control M, Solaris, Business Objects Xi R2, First Rate, power exchange 8.6.1
ConfidentialCamden, NJ
Sr ETL Consultant
Responsibilities
- Designed Framework for Envision data warehouse ETL Architecture.
- Installed and configured Informatica services, managing the repository and administration of Informatica services.
- Designed data mapping specifications for OTC, MTS, ATR streams in SAP and ANZ application into Envision.
- Created envision dimensional data model from scratch and converted functional requirements into source to target mapping specifications.
- Designed Load strategies and scheduling plans for the global application for loading Global feeds.
- Built hierarchical master data, conditional merging of data from multiple sources and built metadata system codes.
- Developed Informatica mappings, sessions and workflows.
- Designed and developed pl/sql ETL routines using procedures, functions, packaged procedures.
- Developed shell scripts and control scripts.
- Designed Business objects Universe for ANZ portfolio, Defined Hierarchies, aggregates and resolving join path problems.
- Defined coding standards and change management standards.
- Designed Source to target mapping specifications. Enhancements to existing Data model.
- Application performance tuning and defined ETL service level agreements (SLA).
- Environment creation, Issue resolution and Bug fixing.
Environment: Informatica Power Center 8.6, Power Connect, power exchange, Oracle 10G, SAP, Flat Files, XML, Business Objects Xi, shell programming, Mercury Quality center, Tivoli Work Scheduler (TWS), Aix (Unix)
ConfidentialPlainsboro, NJ
Sr ETL Consultant
Responsibilities
- Involved in gathering business requirements, new data acquisitions and reporting requirements.
- Co-ordination, Leading, Task estimation, Task assignment and peer reviewing.
- Design and development of ETL process using ETL tool Informatica Power Center. Involved in developing ETL mappings/workflows using Informatica Power Center.
- Wrote UNIX shell scripts for automation of Power Center workflows, pre and post session file processing tasks etc.
- Involved in fine tuning and performance tuning of ETL feeds.
- Implemented error handling, restart and recovery procedures.
- Used Clear Case for version control Unix scripts, SQL procedures.
- Involved in preparing test data, test cases and executing test cases.
- Operations & production support of Data Warehouse. Implemented Informatica velocity standards and also involved in preparing documentations.
Environment: informatica 8.1, Oracle 9i, pl/sql, xml, Oracle Clinical, Flat Files, shell programming, Auto-sys, clear case, documentum and sun Solaris, ERWIN.
ConfidentialSan Jose, CA
ETL Developer
Responsibilities
- Responsible for creating technical specifications and complete flow diagrams for ETL process from functional specifications and Business users.
- Designing of Business objects universes and resolving the join path problems.
- Creation of control scripts and Audit scripts using UNIX and Sql plus.
- Interfacing and configuring Power exchange CDC to implement real time Extraction from 24/7 high available Oracle live databases.(target is Teradata).
- Designed and developed ETL process using informatica, Oracle.
- Created UNIX scripts for Administration like starting and stopping informatica server and cleaning log files. Deploying code from development to QA and production environments.
- Production and Beta sandbox maintenance.
- Informatica administration including configuration, up gradation and folders creation.
- Fine tuning ETL mappings, Sessions and ETL routines and Re design existing process for improving performance and tune the SQL and PL/SQL.
- Development of ETL routines with PL/SQL packages, procedures, functions and SQL loader scripts.
- Created UNIX scripts for automating the ETL Notification (manual level 1 support elimination) Process.
- Created unit test plans for ETL process. Supported test groups for SIT and UAT.
- Supporting cross-functional teams by resolving design issues.
- Preparation of job scheduling plan based on the dependencies and CPU contention on hosts.
- Involved in data modeling meetings for creating new data models or changing the existing data model table structure without impacting existing applications.
- Updating and maintaining production hand off docs.
- Worked in 3rd level support (developer level) in issue resolution for daily production load issues.
Environment: Informatica Power Center 7, Teradata V2R6, Oracle 10g, Db2 UDB, SAP, Clear case, Web services, XML, Clear quest and MS Visio, power exchange, Appworx scheduler, Solaris 7 (UNIX).
ConfidentialSr.Developer
Responsibilities
- Responsible for Analyzing and designing specifications and complete flow diagrams and blue prints for ETL process from Data Dictionary and functional documents from Business analyst.
- Enhancing existing database design for new enhancements reverse engineering,
- Designed and developed informatica mappings for loading MR details of the clients into Maintenance and repair (MR) reporting data mart using various transformations like Update Strategy, Dynamic Lookup, Static Lookup, Source Qualifier, Filter, Router, Sequence generator, Aggregator and Joiner transformations.
- Involved in finding performance bottlenecks and resolved performance problems in ETL process.
- Synchronized MVS VSAM sources to informatica workflows through power exchange.
- Understanding the differences for decoding used in different source systems (RETAS, TAXI, CAD) that are feeding the Data mart.
- Created UNIX scripts for automating the ETL load Process and the cleanup process.
- Created unit test plans for ETL process. Supported test groups for SIT and UAT.
- Worked with PL/SQL procedures and packages.
- Supporting cross-functional teams by resolving design issues.
Environment: Informatica Power Center, Oracle 9i, Quest central, Toad, Db2, Solaris, JCL, MS Visio, CONTROL-M, VSAM, UNIX.
ConfidentialETL Developer
Responsibilities
- Responsible for creating technical specifications and complete flow diagrams for ETL process from functional documents.
- Designed and developed informatica mappings for loading compensation details of the employees of the clients into their associated reporting data marts using various transformations like Update Strategy, Dynamic Lookup, Static Lookup, Source Qualifier, Filter, Router, Sequence generator, Aggregator and Joiner transformations.
- Worked with People soft HRMS History source tables. Involved in analyzing the existing People soft HRMS data and designing the ETL workflows and mappings.
- Involved in finding performance bottlenecks and resolved performance problems in ETL process.
- Understanding the differences for decoding used in CDS (OLTP SYSTEM) source data for all 3 different clients in which 3 different business logics are implemented.
- Created and modified (while implementing for different clients) UNIX scripts for automating the ETL load Process.
- Created unit test plans for ETL process. Supported test groups for SIT and UAT.
- Created user and Group IDs for developers and granting access permissions to work with Informatica application tools and repositories for development and test environments.
Environment: Power Center 7, Oracle 9i, DB2, Quest central, People soft HRMS, Solaris 7 (UNIX)
ConfidentialDeveloper
Responsibilities
- Creating SQL loader scripts to automate file loads in batches.
- Developed ETL routines using oracle pl/sql.
- Creation and modification of pl/Sql packages, procedures and functions.
- Handling business exceptions using Meta data tables.
- Adhoc report creation based on requests.
- Created database tables, indexes in ORACLE RDBMS implementing various business rules using appropriate constraints and database triggers.
Environment: Oracle 8i, Solaris 7.0,crystal reports, unix shell scripting.