Senior Consultant Big Data Engineering and Architecture Resume

SUMMARY:

Over 4+ years’ hands - on experience in implementing big data applications and performance tuning Hadoop/Spark implementations
Over 5+ years’ hands-on experience with Hadoop Ecosystems: HDFS, Hive, Spark, NIFI, Kafka, MongoDB, BigQuery and Sqoop
Over 8+ years’ programming experience with SQL, Python, SAS, Scala, Spark, Hive, Shell
Extensive knowledge in implementation of Data Lake Architecture and Cloud based Enterprise Data warehousing
Extensive experience in implementation of Hadoop Big Data Ingestion and Transformation (Batch & Streaming) Framework
Working knowledge of Java, AWS, GCP, Tensorflow, Docker and Modern Distributed Micro services Architecture

PROFESSIONAL EXPERIENCE:

Senior Consultant Big Data Engineering and Architecture

Confidential

Technologies and Frameworks: Hive, Spark, Kafka, NIFI, MongoDB, BigQuery, Hortonworks, GCP

Responsibilities:

Develop technical architectures, designs, and processes to extract, cleanse, integrate, organize and present data from a variety of sources and formats for analysis and use across use cases.
Perform machine learning pipeline productionize and performance tuning, hands on experiences in spark application and hive queries performance tuning and pipeline re-design
Perform data profiling, discovery, and analysis to identify/determine location, suitability and coverage of data, and identify the various data types, formats, and data quality which exist within a given data source.
Work with source system and business SME's to develop an understanding of the data requirements and options available within customer sources to meet the data and business requirements.
Create logical extraction/ingestion templates and maps to demonstrate the logical flow and manipulation of data required to move data from customer source systems into the target data lake.
Perform hands on data development to accomplish the data extraction, movement and integration, leveraging state of the art tools and practices, including both streaming and batched data ingestion techniques.

Big Data Engineer/Architect

Confidential

Technologies and Frameworks: Spark, Kafka, Hive, Sqoop, NIFI, Scala, Shell, Python, Hortonworks, Azure

Responsibilities:

Design and implement big data ingestion platform for flat files and RDBMS
Design, Develop and implement of performant real time ETL pipelines using NIFI, Kafka, Kafka Streams, Scala, Spark Streaming
Design and build big data transformation pipeline using Spark parsing raw XML files to generate critical monthly business reports
Lead data engineering team to deliver prototypes during proof of concept phase, conduct 20+ Successful Proof of concepts to fail fast, learn fast - Podium, Datastage, HP Voltage, Druid, Waterline, Presto, Alteryx and Syncsort
Design and implement the data science layer of the advanced analytics platform that provides access to our powerful machine learning algorithms.
Build and develop real time analysis pipelines to machine learning models using Kafka for steaming messages, and continuously improve and optimize the performance and scalability of the underlying implementation
Develop reusable software frameworks/components and tools to enable data scientists or analysts to prototype, develop and automate data science pipelines efficiently.
Perform unit testing to ensure the code is robust and solid. Develop new big data technologies to improve performance, scalability, extensibility, stability, and fault tolerance of our big data platform.

Hadoop Data Engineer (Big Data Ingestion Technical Lead)

Confidential

Technologies and Frameworks: Hortonworks, Hadoop, Yarn, Hive, Shell, Spark, Scala, Python, Sqoop, Netezza, DB2

Responsibilities:

Designed automated preprocess and ingestion workflow including historical data extraction and ongoing files from multiple sources with various formats such as COBOL FB/VB, flat fixed length, flat delimited, Json and XML.
Preprocess and Split TSYS COBOL VB/FB raw files using python by record ID and record length variables into clean fixed length single schema COBOL files. Recognize data patterns, quantify potential issues, troubleshooting defects of vendor’s copybooks and identify solutions. Successfully ingested all 27 EBCDIC files into Hadoop Cluster in one and a half months.
Create Metadata XML files for incoming source data files, Develop python codes to extract technical metadata from COBOL copybooks, DES layout and DDL and transform into metadata XML format for big data ingestion framework usage.
Involved in the design of ETL workflow from landing zone to consumer zone for loading data to Data Lake.
Proficient understanding of code versioning tools as Git, SVN
Design, development and implementation of performant ETL pipelines using PySpark API
Developed over 100 ETL processes with complex logic including incremental loading, Snapshot tables, Slowly Changing Dimension Type 2 for over 20 business subjects using Hive, Spark, Java, Python and shell scripting.
Developed various Hive UDFs to perform mapping transformations. Developed Spark application to simplify SCD Type2 transformation logic.
Set up ELT workflow automation using Tidal Scheduler. Design and build batch control table and audit framework for the whole ELT pipelines

Risk Analyst

Confidential

Responsibilities:

Interpreted financial statements of small businesses and input the results into Confidential ’s Risk Analyst and generated the financial reports and BRR (Borrower Risk Rating Score)
Combined GAAP, notes to financial statements, and overarching guidelines to correctly identify andapplied requested adjustments to financial statements, generated and prepared a set of credit rating reports and BRR rating sensitivity test reports

Data Scientist

Confidential

Responsibilities:

Involved in Designing, developing, and implementing Hadoop-based big data solutions.
Collected and aggregated large amounts of traffic data from landing server in HDFS for analysis.
Monitored data collection workflow and performed marginal checking on daily basis.
Extracted real time raw traffic data from Data Warehouse with Sqoop, Transformed and loaded traffic time series data to EDL.
Using Pandas, scikit-learn libraries of python, performed extraction, exploration, visualization and transformation of features.
Conducted research with data pre-cleaning, data mining technologies, applied the association rule mining and clustering analysis and found the rules and knowledge in some temporal, spatial or spatial-temporal associated relations

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship