Sr. Big Data Engineer Resume
Charlotte, NC
PROFESSIONAL SUMMARY:
- Certified Big Data Engineer with 15 years of experience in designing and development of Big Data platforms with cutting - edge technologies (Spark, Kafka, NiFi,AWS, CDH) to develop end to end data pipelines taking into consideration data quality, scalability, performance and maintainability .
- Design and development of ELT/Data streaming platforms using Big Data eco system and cutting-edge technologies
- Experience in design and development of Ingestion framework from multiple sources to Hadoop using Spark framework with PySpark and PyCharm
- Prepare the Data sets as per requirement for Data Scientists to analyse the data usage pattern and make recommendations
- Experienced in building data ingestion platform from various channels to Hadoop using PySpark and Spark Streaming framework
- Design and perform data transformation using data mapping and data processing capabilities like Spark SQL and PySpark, Python, Scala
- Specialized in software development of data pipes lines and implementation of streaming Analytics platforms using Cloudera and AWS platforms
- Specialized in Perform software analysis, risk analysis, reliability analysis and perform actions accordingly
- Big Data Architecture and developing highly scalable large-scale distributed data processing Systems
- Cognitive about designing, deploying and operating highly available, scalable and fault tolerant systems using Amazon Web Services (AWS).
- Design, build, and manage an analytics infrastructure that can be utilized by data analysts, data scientists to enable big data analytics
- Development of Machine Learning and AI solutions for IOT, advanced power flow distributions and Smart Grid using Tensor flow, Spark, Tiger Graph, Python.
- Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources.
- Experienced working with an agile/scrum environment passion.
TECHNICAL SUMMARY:
Big Data Technologies: Spark, Kafka, NiFi, PySpark, Python, Anaconda, Pandas, GPU, Scala, GCP & AWS Analytics
Development Technologies: GCP, Big Query, AWS RDS - AWS EMR/EC2, Glue, Redshift GCP & AWS Analytics
Development Technologies: PySpark, Github, UCD, Jenkins, Spark, Kafka, Python, AWS EMR/EC2, Glue, Athena, Ambari, Cloud break, AWS CloudWatch, Cloudera Manager, Knox, Grafana
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
Sr. Big Data Engineer
Responsibilities:
- Implementation of PySpark built-in, cutting-edge machine learning routines, along with utilities to create full machine learning pipelines using PySpark on Hadoop 3 platform
- Design and develop Spark Data Frames that behave like a SQL table and perform data manipulations like Grouping and Aggregating and filtering the data
- Data model tuning and selection with PySpark machine learning framework to create models and fit the models
- Development of Machine Learning and AI solutions for IOT, advanced power flow distributions and Smart Grid using Tensor flow, PySpark, Graph DB, CUDA and NUMBA, Python.
- Implement PySpark.sql module, which provides optimized data queries to the Spark session
- Create the Pipeline using pyspark.ml module that combines all the Estimators and Transformers
- Development of real time streaming Analytics using HDP, HDF, NiFi, PySpark
- Implementation of AWS cloud computing platform using AWS EMR/EC2, Glue, Redshift, Athena, S3, RDS, Dynamo DB, RedShift, Python
- Understand the existing data pipes line and implement faster streaming Analytics using HDP & HDF Data flow with Apache NiFi, Kafka and Spark
Environment: - AWS , Hortonworks, NiFi, PySpark, Scala, Kafka, Java, HDFS, Hive, AWS, Red Hat Linux, Ambari, Redshift and DynamoDB
Confidential, Atlanta, GA
Sr. Technical Architect
Responsibilities:
- Design and development of Big Data analytical platform as per client requirements and engage technical discussions
- Understand the existing data pipes line and implement streaming Analytics using HDP & HDF Data flow with Apache NiFi, Kafka and Spark
- Performed real time streaming process thru Data lake by using HDP, HDF, NiFi, PySpark
- Implement AWS cloud computing platform using AWS EMR/EC2, Glue, Redshift, Athena, S3, RDS, Dynamo DB, RedShift, Python
- Understand the existing data pipes line and implement faster streaming Analytics using HDP & HDF Data flow with Apache NiFi, Kafka and Spark
- Managing and developing framework Confidential Enterprise level and develop Custom processors for specific requirements using NiFi
- Implemented HDF Data flow and Data plane for manage, Secure and Govern data across data centers
- Performed processing of various data sets ORC; Parquet; Avro; json using PySpark
- Implement AWS cloud computing platform using AWS EMR/EC2, Glue, Redshift, Athena, S3, RDS, Dynamo DB, RedShift, Python
Environment: - AWS , Hortonworks, NiFi, PySpark, Scala, Kafka, Java, HDFS, Hive, AWS, Red Hat Linux, Ambari, Redshift and DynamoDB
Confidential, Atlanta, GA
Sr. Big Data Architect
Responsibilities:
- Understand the existing data pipes line and implement faster streaming Analytics using HDP & HDF Data flow with Apache NiFi, Kafka and Spark
- Design and development of Big Data analytical platform as per client requirements and engage technical discussions
- Design and deployment of the code in on premise, AWS cloud OR hybrid environment
- Implementation of HDF Data flow and Data plane for manage, Secure and Govern data across data centers and in the Cloud
- Performed real time streaming process thru Data lake by using HDP, HDF, NiFi, Spark/ Scala, Kafka
Environment: - Cloudera, AWS, EMR, Redshift, Linux, Cloudera Manager, Cloud watch
Confidential, Atlanta, GA/Dublin, OH
Sr. Big Data Architect
Responsibilities:
- Implementation of Hortonworks Data Platform with High availability solutions and managing it(HIPAA)
- Developed High Speed BI layer on Hadoop platform with Apache Spark & Java & Python
- Developed Core java client API's for HBase that is used to perform CRUD operations on HBase tables
- Worked on Java H base constructors and H base Java classes put, get, results methods
- Experience in using Version Control Systems like ClearCase, CVS, SVN and GIT.
- Work with Application team in implementation of Cassandra and fine tune according to requirements
- Thorough understanding of client requirements and implementation of Data Ingestion confidentiality, Ingest workflows
Environment: - Hadoop, Hortonworks, NiFi, Spark, Scala, Kafka, Java, HDFS, Hive, AWS, Red Hat Linux, Ambari, Grafana, Zeppelin
Confidential, Atlanta, GA
Sr. Oracle Consultant
Responsibilities:
- Development of Oracle applications as per client requirements
- Installation & configuration of Apache Hadoop cluster using Pseudo-Distributed Operation, Fully-Distributed Operation
- Setup & Manage HDFS Federation CDH 4.2, YARN/MapReduce 2 (MR2)
- Setup & Manage of Hadoop, big Data, Hadoop Administration, Hive, HBase, Pig, Mahout, Spark, Linux, Scripting, Python, Perl, Shell, Open Source
Environment: - Oracle, PL/SQL, Oracle BDA, PostgreSQL, Stream sets, Golden gate, Cloudera
Confidential
Technical lead
Responsibilities:
- Coordinate with team members and plan the project tasks in agile environment
- Development of Oracle applications as per client requirements
- Performed RAC upgrades from 10g RAC to 11gRAC
- Provided technical expertise on multiple environments (Development, Integration
- Installed and configured 4 node 11g RAC with ASM
- Performed RAC upgrades from 10g RAC to 11gRAC
- Perform day to day operations on RAC maintenance.
- Support for application code builds and adheres to software development lifecycle (SDLC)
Environment: - Oracle, Oracle RAC, ASM, PostgreSQL