Technical Lead Resume
EXPERIENCE SUMMARY:
- 9 years of experience leading and implementing various data warehousing projects for multiple clients and domains
- Experience working with Ecosystems like Hive, Pig, Sqoop. Strong knowledge of Spark, Pig and Hive’s analytical functions testing.
- Used Spark DataFrames, Dataset API to perform analytics on data in Hive
- Experience using Pandas package in PySpark
- Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Working experience in creating complex data ingestion pipeline, data transformations, data management and data governance
- Experience in importing and exporting terabytes of data using Sqoop from HDFS to Relational Database Systems and vice - versa
- 5+ years of Dimensional Data Modelling experience, OLAP, Fact & Dimensions tables, Physical & Logical data modelling, development and migration of Stored Procedures
- Extensive experience in requirement gathering, data profiling, analysis, source-to-target mapping, design and implementation of ETL solutions in development and migration projects
- Experience with Agile Methodology
- Saved thousands of Teradata CPU seconds by table redesigning and performance tuning
- Teradata and Informatica Certified professional; Certified in Project Management from IMT Ghaziabad
- Extensive experience in working with different databases such as Teradata, Sybase, SQL
TECHNICAL SKILLS:
Database Management System: Spark 2.0, Teradata v12, v13, v14, Teradata Utilities, Sybase, Hive, Impala
Operating Systems: AIX UNIX, Windows, MS-DOS, MAC & Ubuntu
Programming Language: Python
ETL Tools: Pig, Informatica v9.1
Other Tools: Sqoop, Putty, Interactive SQL, SQL Advantage, MS-Office, PVCS, HP Quality Centre 10, Assyst, CVS
PROFESSIONAL EXPERIENCE:
Confidential
Technical Lead
Responsibilities:
- Working as technical SME for BI ETL and leading a team of 9 members
- Designed data pipeline to consume data from different sources
- Designed flat tables to be used for feature calculation
- Designed pipeline to productionize the model to predict the high-risk customers
- Created features in pyspark using aggregation functions such as standard deviation, z-score
- Used calculated feature values to score the customer and send the ATL+BTL cases to review by FIU (Financial Investigation Unit)
- Analyzing large data sets by running Hive queries and SparkSql using Python
- Loaded the data into Spark Dataframes and do in memory data Computation to generate the Output response
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Used Spark data frame API for writing custom transformation and data aggregation.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Responsible for project planning, scope management, resource estimation, change control process, define data quality process and standards, verification and validation
- Co-ordinate with business owners, DBAs, reporting team to address system improvements
- Designed and Customized data models for Data Mart supporting data from multiple sources on real time
- Mentored individuals from business and technical stand point
Environment: Hadoop, SparkSql, Hive, Unix
Confidential
Teradata Technical Lead & Hadoop Developer
Responsibilities:
- Coordinating with clients to understand the Business requirements
- Performing Data Mapping, Data Modeling, Design, Development, Implementation
- Developed Teradata procedures and Perl scripts for data cleansing and for performing complex business logic
- Data profiling to identify the distinct values, count of values, nulls in columns to define join strategies and compression techniques; identifying data anomalies and proposing standardized values for ETL solution
- Loading flat files from mainframe to Teradata using Informatica and Teradata utilities
- Involved in POC to design & develop Hadoop solutions for big data problems
- Loading the CSV and fixed-width files to HDFS
- Teradata performance tuning; saved thousands of TD CPU time by taking the responsibility to change the table structure for Hispanic Dashboard
- Involved in data migration from Teradata to Hive tables using Sqoop
- Developing Hive scripts to query the data files & store in Hive partitioned tables
- Created Pig UDFs to perform ETL; created and loaded Hive/Impala tables
- Exporting the data from My-SQL Server to Hive tables using Sqoop jobs
Environment: Teradata, Teradata Loader Utilities, SQL, Hadoop, Pig, Hive, Sqoop, Unix, Informatica and Perl
Confidential
Teradata Designer/Developer
Responsibilities:
- Coordinating with Business team in UK to understand the Business requirements
- Source data mapping & development of High & Low Level Design documents
- Development of Informatica mappings for Teradata code migration
- Coordination with client/offshore team on the changing requirements during agile project development
- Coordination with the Business/testing/offshore team on defects raised during various testing phases
- Functional/Technical peer reviews of the deliverables
- Development of history data fixes for production issues
- Coordination with the Release/Configuration team during various implementation activities
Environment: Teradata, Teradata Loader Utilities, SQL, Informatica, Unix