Technical Lead Resume

EXPERIENCE SUMMARY:

9 years of experience leading and implementing various data warehousing projects for multiple clients and domains
Experience working with Ecosystems like Hive, Pig, Sqoop. Strong knowledge of Spark, Pig and Hive’s analytical functions testing.
Used Spark DataFrames, Dataset API to perform analytics on data in Hive
Experience using Pandas package in PySpark
Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
Working experience in creating complex data ingestion pipeline, data transformations, data management and data governance
Experience in importing and exporting terabytes of data using Sqoop from HDFS to Relational Database Systems and vice - versa
5+ years of Dimensional Data Modelling experience, OLAP, Fact & Dimensions tables, Physical & Logical data modelling, development and migration of Stored Procedures
Extensive experience in requirement gathering, data profiling, analysis, source-to-target mapping, design and implementation of ETL solutions in development and migration projects
Experience with Agile Methodology
Saved thousands of Teradata CPU seconds by table redesigning and performance tuning
Teradata and Informatica Certified professional; Certified in Project Management from IMT Ghaziabad
Extensive experience in working with different databases such as Teradata, Sybase, SQL

TECHNICAL SKILLS:

Database Management System: Spark 2.0, Teradata v12, v13, v14, Teradata Utilities, Sybase, Hive, Impala

Operating Systems: AIX UNIX, Windows, MS-DOS, MAC & Ubuntu

Programming Language: Python

ETL Tools: Pig, Informatica v9.1

Other Tools: Sqoop, Putty, Interactive SQL, SQL Advantage, MS-Office, PVCS, HP Quality Centre 10, Assyst, CVS

PROFESSIONAL EXPERIENCE:

Confidential

Technical Lead

Responsibilities:

Working as technical SME for BI ETL and leading a team of 9 members
Designed data pipeline to consume data from different sources
Designed flat tables to be used for feature calculation
Designed pipeline to productionize the model to predict the high-risk customers
Created features in pyspark using aggregation functions such as standard deviation, z-score
Used calculated feature values to score the customer and send the ATL+BTL cases to review by FIU (Financial Investigation Unit)
Analyzing large data sets by running Hive queries and SparkSql using Python
Loaded the data into Spark Dataframes and do in memory data Computation to generate the Output response
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Used Spark data frame API for writing custom transformation and data aggregation.
Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
Responsible for project planning, scope management, resource estimation, change control process, define data quality process and standards, verification and validation
Co-ordinate with business owners, DBAs, reporting team to address system improvements
Designed and Customized data models for Data Mart supporting data from multiple sources on real time
Mentored individuals from business and technical stand point

Environment: Hadoop, SparkSql, Hive, Unix

Confidential

Teradata Technical Lead & Hadoop Developer

Responsibilities:

Coordinating with clients to understand the Business requirements
Performing Data Mapping, Data Modeling, Design, Development, Implementation
Developed Teradata procedures and Perl scripts for data cleansing and for performing complex business logic
Data profiling to identify the distinct values, count of values, nulls in columns to define join strategies and compression techniques; identifying data anomalies and proposing standardized values for ETL solution
Loading flat files from mainframe to Teradata using Informatica and Teradata utilities
Involved in POC to design & develop Hadoop solutions for big data problems
Loading the CSV and fixed-width files to HDFS
Teradata performance tuning; saved thousands of TD CPU time by taking the responsibility to change the table structure for Hispanic Dashboard
Involved in data migration from Teradata to Hive tables using Sqoop
Developing Hive scripts to query the data files & store in Hive partitioned tables
Created Pig UDFs to perform ETL; created and loaded Hive/Impala tables
Exporting the data from My-SQL Server to Hive tables using Sqoop jobs

Environment: Teradata, Teradata Loader Utilities, SQL, Hadoop, Pig, Hive, Sqoop, Unix, Informatica and Perl

Confidential

Teradata Designer/Developer

Responsibilities:

Coordinating with Business team in UK to understand the Business requirements
Source data mapping & development of High & Low Level Design documents
Development of Informatica mappings for Teradata code migration
Coordination with client/offshore team on the changing requirements during agile project development
Coordination with the Business/testing/offshore team on defects raised during various testing phases
Functional/Technical peer reviews of the deliverables
Development of history data fixes for production issues
Coordination with the Release/Configuration team during various implementation activities

Environment: Teradata, Teradata Loader Utilities, SQL, Informatica, Unix