Senior Big Data Engineer Resume
SUMMARY
- An accomplished and innovative IT professional with 15+ years of experience offers extensive expertise in building production ready Big Data ingestion pipelines using Scala, Java, Python, Apache Spark, Apache Kafka, Akka, and Hbase.
TECHNICAL SKILLS
Languages: Scala, Java 8, Redis, Pig, Hive, Impala, Spark 2.0, T - SQL, Oracle SQL, PL/SQL, PostgreSQL
Databases: Elasticsearch, HBase, DynamoDB, MongoDB, Redshift, SQL Server 2000/2005/2008 R2/2012, MySQL, Oracle 9i/10g/11g
Tools: and Frameworks: Akka, Spring, Spring Boot, Apache Spark, Spark Streaming, Kafka, AWS, Amazon EMR, Hadoop, Flask, Docker, Talend, Talend Enterprise Big data, SSRS, SSIS, SSAS
Operating Systems: Linux (Red Hat, CentOS), Solaris
PROFESSIONAL EXPERIENCE
Confidential
Senior Big Data Engineer
Responsibilities:
- Working on design and implementation of highly concurrent, scalable, distributed, and resilient applications to process 200 TB+ data including computation of BitSet and HyperLogLog using Akka, Kafka, Spark, Hbase.
- Designed and implemented analytics framework to process 10+TB Client’s data using Kafka, Spark streaming Scala API, Redis, and Hbase.Led re-design of the core framework to scale out Io -Tahoe applications data processes using Spark and HBase, which is able to compute BitSet, HyperLogLog, and other statistics on table columns in parallel fashion; load those metrics into Hbase as storage which will provide fast data retrieval for further computations
Confidential
Big Data Engineer
Responsibilities:
- Designed and implemented data ingestion Spark streaming framework from various data source like REST API, Kafka using Spark Streaming Scala API and Kafka.
- Developed Cyber Analytics Workbench framework, which protects JPMChase clients from phishing attempts. Designed and implemented real-time Spark streaming applications integrated with Kafka to handle large volume and velocity data streams in a scalable, reliable and fault tolerant manner; implemented Kafka offset management and application monitoring framework to monitor data inflow using Scala, Python, Spark streaming, and Kafka.
- Developed backup and restore application to transfer data from various data sources to private Cloud using Java Concurrency framework, AWS S3 API, and Hadoop HDFS API.
- Developed Data Quality (DQ) framework to ensure data validity and consistency for consumption by downstream applications using Spark Scala API.
- Developed a string similarity spark application for matching company names from multiple source using edit distance algorithms and Spark Scala API; tuned the spark jobs for optimal efficiency by increasing parallelism and reducing shuffles; implemented a spelling check engine using Python NLP to automatically correct misspelled company names.
Technologies Used: Scala, Java 8, Python, Hive, Impala, HBase, NLP, Pandas, NumPy, Flask, Unix shell, Apache Spark 2,Spark Streaming, Kafka, Cloudera CDH, Docker.
Confidential
Lead Big Data Engineer
Responsibilities:
- Designed and implemented data ingestion Spark streaming framework to pull data from multiple data sources (Facebook, Youtube, Instagram, and Twitter) using REST API, Spark streaming, and Scala.
- Designed and implemented big data ingestion pipelines to ingest multi PB data from various data source using Kafka, Spark streaming including data quality checks, transformation, and stored as efficient storage formats like parquet.
- Built a POC sentiment analyzer engine using NLP (Natural Language Processing) and Python.
- Developed an internal search engine to look up news transcripts for NBC news shows, by creating a python parser to convert those transcripts to JSON documents and loading into Elasticsearch; implemented automated data ingestion process for indexing new transcripts as they become available.
Technologies Used: Scala, Python, Java, Hive, Pig, NLP, pandas, Unix shell, Apache Spark, Kafka, Amazon EMR, Elasticsearch, Hortonworks.
Confidential
Lead Big Data Engineer
Responsibilities:
- Designed and implemented Big Data analytics platform for handling data ingestion, compression, transformation and analysis of 30+ TB payroll and HR data. Implement automation, traceability, and transparency for every step of the process to build trust in data and streamline data science efforts using Python, Java, Hadoop streaming, Spark, Spark SQL, Scala, Hive, and Pig.
- Designed highly efficient data model for optimizing large-scale queries utilizing Hive complex data types and Parquet file format.
- Performed data validation and transformation using Python and Hadoop streaming.
- Developed highly efficient Pig Java UDFs utilizing advanced concept like Algebraic and Accumulator interface to populate Confidential Benchmarks cube metrics.
- In order to improve performance, scalability,and memory usage ofprocessing large volume of payroll data, adopt Spark and Spark SQL to build Confidential Benchmarks cubes and populate cube annual and quarter metrics using Scala.
- Perform Big Data analysis using Scala, Spark, Spark SQL, Hive, Mlib, Machine Learning algorithms.
Technologies Used: Scala, Python, Java, Pig, Hive, Unix shell, Apache Spark 1.3, Spark SQL, PySpark, CDH 5.3, Red Hat Enterprise Linux.
Confidential
Big Data Engineer/Data Architect
Responsibilities:
- Designed and developed an automatic data processing pipeline, including data ingestion, compression, transformation, loading data into Hive database to process CDR (call detail records) using Unix Shell, Hadoop streaming, Python, Java, Hive, Pig, and UDFs.
- Performed data profiling and transformation on the raw data using Pig, Python, and Java
- Designed efficient data model for loading transformed data into Hive database.
- Created analytics reports using Hive.
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Analyzed the business requirements and data, designed, developed and implemented highly effectively, highly scalable ETL processes for a fast, scalable data warehouses.
- Developed various Oracle SQL scripts, PL/SQL packages, procedures, functions, and java code for data
- Extraction, transformation, and data load.
- Performed SQL Query and Database tuning for high BI reporting performance.
Technologies Used: Python, Java, Hive, Unix shell, Hadoop, ETL(Talend), Oracle SQL, PL/SQL, Oracle 11.2g, Red Hat Enterprise Linux
Confidential
Big Data Engineer/ETL Architect
Responsibilities:
- Built Big Data analytical framework for processing healthcare data for medical research using Python, Java, Hadoop, Hive and Pig. Integrated R scripts with MapReduce jobs.
- Perform data transformation using Pig, Python, and Java
- Designed efficient data model for loading transformed data into Hive database.
- Created analytics reports using Hive.
- Designed, developed, and implemented conceptual, logical and physical data models for highly scalable and high performance relational database systems.
- Designed, developed, and implemented highly scalable and efficient ETL flows using SSIS.
- Extensive experience in tuning SQL query and multi-terabyte databases for high performance; Utilized new features of SQL Server 2012, such as Columnstore indexes, table partitioning to increase performance in data warehousing.
- Developed various T-SQL Scripts, Stored Procedures, and UDFs, C# code for data extraction, transformations and data load.
- Performed SSIS tuning for efficiency ETL and utilized best practices SSIS design pattern to improve ETL performance and scalability.
Technologies Used: Python, Java, Hadoop Pig and Hive, Unix shell, T-SQL, Oracle SQL, PL/SQL, Oracle11g, SSIS/SSRS/SSAS 2012/2008R2, Microsoft SQL Server 2012/20008R2, Visual C#.NET, Sybase15.5, CentOS
Confidential
Senior BI/Database Developer/DBA
Responsibilities:
- Designed, developed, and implemented SSIS framework and templates for ETL efficiency.
- Designed, developed, and implemented robust, efficient ETL for processing of healthcare data from various sources into Data Warehouse, ODS, Data Marts using SSIS.
- Led design, development, implementation of SSRS reports and development and management of the Microsoft BI reporting environment.
- Extensive experience in SQL programming, stored procedure creation and optimization as well as tuning and maintenance of highly available and highly transactional databases.
- Designed, developed, and implemented conceptual, logical and physical data models for analytics reporting with scalability in mind.
- Developed various SQL scripts, Stored Procedures, UDFs, views, triggers, and tables for data extraction, transformations and data loads.
- Extensive experience in tuning SQL query and Multi-Terabyte database for high performance.
Technologies Used: T-SQL, SSIS/SSAS/SSRS 2012/2008R2, Windows PowerShell, Visual C#.NET, ASP.NET, SQL CLR, XML, Microsoft SQL Server 2000/2005/2008 R2/2012, Crystal Report
Confidential
Software Engineer
Responsibilities:
- Developed Radioisotope Inventory Web application using Java, JSP, and Oracle; designed a user-friendly Web interface using HTML and JavaScript.
- Performed the logical and physical data model design to support the development of Web Based Radioisotope Inventory Control System.
- Performed SQL, database tuning and optimization to increase the performance of application and data retrieval.
- Developed ETL processes and implemented Perl and SQL*Loader scripts to load data from legacy Informix database to Oracle database
- Implemented reporting solutions using Crystal Report
- Designed and developed software using C, java, and Perl to facilitate manipulation of brain images created with PET (Position Emission Tomography) and MRI (Magnetic Resonance Imaging) modalities for laboratory’s research in brain aging and development.
- Performed administration duties of SunOS/Solaris UNIX systems that supported PET and MRI modalities; Developed automation shell scripts for system and data management.
Technologies Used: Java, JavaScript, C, JSP, Perl, HTML, CSS, SQL, PL/SQL, Oracle 9i/10g, Crystal Report, Unix shell, Red Hat Linux, SunOS/Solaris