We provide IT Staff Augmentation Services!

Aws Cloud Big Data Analytics Lead Resume

3.00/5 (Submit Your Rating)

Atlanta -, GA

SUMMARY

  • An IT professional with 15+ years of Experience in all phases of software development life cycle including Solution Architect, System Analysis, Design, Development, Implementation, Testing, Maintenance of Various applications in Windows, Mac and Linux Platform.
  • 8+ Years of Experience in Hadoop, Spark, Kafka, Akka and Scala development including Hadoop ecosystem like Hive, HDFS, Kudu, Hbase, Phoenix, Sqoop, MapReduce, Yarn, Avro, Parquet, Kafka,Oozie, ZooKeeper, Nifi,MLlib,Solr, NLP, Elasticsearch, Text Mining, AWS, Docker, Ansible, Prometheus, Graphite, Grafana, DevOps.
  • 4+ Years of Experience in Streamset, AWS Cloud Solution Architect, Design and Develop using Lambda, ApiGateway, StepFunctions, IAM, KMS, SQS, SNS, S3, Kinesis, DynamoDB, EMR, RDS, Redis, ElastiCache, RedShift, CloudWatch, EC2, Neptune, SageMaker, Terraforms, Serverless, CloudFormation, FarGate.
  • 2+ Years of Data Science, Data Analytics, Machine Learning, and Statistical analysis of Structured and Unstructured data.
  • Expertise in building Data Lake, ETL, Building Real - Time, Near-Real time Analytical Systems.
  • Hands on Experience in data ingestion, data cleansing, data mining and data pipelining from various data sources which includes Structured, Semi-Structured, and Unstructured and Live Stream data ( XML, TSV, CSV, ExaData, RDBMS, MySQL, Social Networks, Websites, Kinesis Stream, Logs, IOT events, Informatica, Teradata) using various on premises tools like Hive, Sqoop, Oozie, Hbase, Spark, Nifi and AWS Cloud tools like Glue, Lambda, StepFunctions, RDS and APIGateway.
  • Experience in working with StreamSets for building data lake, smart data ingestion pipelines, batch processing, streaming data, and CDC (change data capture) pipelines
  • Strong Experience with Kudu and Impala for designing and building of fast analytical workloads, Near-real-time streaming solutions, predictive modeling
  • Hands on Experience in developing solutions using MicroServices methodologies, AWS Cloud and Akka patterns, container based real-time transformation pipelines.
  • Hands on Experience in developing Cloud Hosted MicroServices solutions using AWS Cloud Stack which includes Lambda, ApiGateway, Cognito, S3, SQS, SNS and CloudWatch.
  • Expertise in developing Micro batching, Near-Real-time Solutions using AWS Cloud stack of Kinesis, Lambda, ApiGateway, ELB, EC2,Redis, ElastiCache, DynamoDB, Neptune, RDS, Glue, EMR and Hadoop BigData Stack of Spark, Scala, Akka, Kafka, and Hbase, Phoenix development including SparkSQL, Graphx, MLlib, and Spark Streams.
  • Hands on Experience in Coding Cloud Infrastructure and defining CI/CD deployment pipelines using Terraforms, Serverless, SAML, AWS Cloud Formation stacks and Ansibles.
  • Professional in developing various big data applications using Scala, Spark, Kafka, HBase,HDFS, Hive, Oozie, Nifi and Apache Phoenix frameworks.
  • Professional in defining Data Models for various scenarios on HBase, HDFS, and Hive systems which includes schema designs, storage formats, metadata managements, denormalization, fine tuning table designs to fit to access requirements.
  • Strong Experience in developing Pig Scripts and Hive Query Language, Hive UDFs, Written Hive Queries for data analysis to process teh data for visualization.
  • Hands on Troubleshooting and debugging skills of distributed applications using Hadoop logs, JVM dumps, Hadoop system logs, various metrics of cluster in order to solve Cluster level production issues and Application issues.
  • Fine tuning Performance of Hadoop Cluster for Spark Jobs, and HBase Reads, Writes and analyzing various metrics to adjust configurations.
  • Strong Expertise in defining teh Secured Hadoop Cluster using teh Kerberos and Writing teh Programs to meet Kerberos requirement to access Hadoop Services.
  • Working Experience on Various big data testing practices using various tools and frameworks for Stress Tests, Spark Unit Testing, Load tests, and Integration Tests.
  • Working Experience in Python scripting and Programming to extract teh data from various sources and applying machine learning algorithms to define various analytical models using Pyspark, MLlib, Pandas, DataFrame, scikit-learn, NLTK, and TensorFlow.
  • In depth Knowledge of Data algorithms like top-n, sentiment analysis, naive Bayes, k-mean clustering, recommendation engine, common friends, MBA, and pattern matching algorithms for data analytics.
  • In depth knowledge of building Machine Learning algorithms like Linear Regression, logistic Regression, classification, neural networks.
  • Hands on Experience in cluster installation, fine tuning cluster configurations based on business needs, monitoring clusters with Prometheus, Graphite, Grafana, and automating teh regular operations and setups using Ansible.
  • Strong Experience in Python, NumPy, pandas, TensorFlow, Keras, PyTorch, AWS Lambda, Neptune, AWS SageMaker.
  • Having good knowledge and Hands on Experience on Scala, Java/J2EE, Spring Framework, Hibernate, JSP, EJBs, Structs, JMS, JMX, RMI, Java Web Services.
  • Strong Experience in Core Java, Spring, Hibernate, Servlet, Reactive Java, Lambda, Streams programming.
  • Hands on working with DevOps tools like Jenkins, Git, Ansible Scripts, and Dockers.
  • Good Experience in frontend web UI development using AngularJS, JQuery UI, BootStrap, HTML5.
  • Expertise in Business use-case analysis, System Architecture, System Design, and Blueprints.
  • Experience in applying several Architectural patterns and in depth knowledge on Object Oriented Programming (OOP), Design Patterns, Database Schema Design, and Data Algorithms.
  • Experience of Relational databases, Experienced in writing stored procedures, triggers and queries in Oracle and SQL Server by using PL/SQL and T-SQL.
  • Experienced in trouble shooting teh issues in Software systems, debugging using Debug tools.
  • Convergent following Agile methodologies like SCRUM management and Test-driven development (TDD) and XP development model.

TECHNICAL SKILLS

AWS Cloud: Lambda, ApiGateway, Kinesis, SNS, SQS, StepFunction, RDS, VPC, EC2, S3, DynamoDB, IAM, RDS, Aurora, Snowflake, SageMaker, EMR, CloudWatch, CloudFormation, Neptune, Athena, Glue

Big Data: Apache Hadoop, Hive, HBase, Spark, Kafka, Storm Flume, Avro, Parquet, Mahout, Sqoop, Pig, Oozie, Zookeeper, HDFS, YARN, R, MapReduce, Nifi, Casandra, MongoDB

Machine Learning: Supervised Learning, Unsupervised Learning, Classification, Recommendation, Feature Engineering, Linear Regression, Logistic Regression, Scikit-Learn, TensorFlow, MLLib, Python, Statistical Modeling, Deep Learning

Scripting Languages: Python, Perl, Groovy, Shell Script, PowerShell, JavaScript, Ruby on Rails, Ansible Script

LANGUAGES/ SDK/ SPECIFICATIONS: Java/J2EE,Spring, Struts, Scala, Python, JUnit, Mockito, Hibernate, EJB, XML,JSON, XSLT, WCF, WF, XAML, JavaScript, AngularJS, HTML, .Net, ASP.Net, C#, WPF, WinForms, VC++, WIN32, COM/DCOM

DATABASE: Microsoft SQL Server 2006(TSQL),2008, Oracle 9i (SQL, PL/SQL),SQLite, MySQL, CouchDB, MongoDB, HBase, Cassandra, Hive, MS-ACCESS

Patterns: Architectural Patterns, Design Patterns (GOF), Concurrency Patterns, Communication Patterns, Data Mining Algorithms

Concepts: Internet of Thing (IOT) Services, Object Oriented Design, Structured Programming, n-tier architecture, Database Design and Multi-threaded programming, Socket Programming, Business process management, Business process automation, workflow development, B2B communication, EAI, Cloud Computing (SAAS, PAAS, IAAS), SOA, Microservices

DEVOPS: GIT, Jenkins, Ansible, Docker, Jira, Prometheus, OpenShift, Terraforms, Serverless

PROFESSIONAL EXPERIENCE

Confidential, Atlanta - GA

AWS Cloud Big Data Analytics Lead

Responsibilities:

  • Overall Responsibility of solution design and system architecture for various modules.
  • Design and Development of various data Ingestion pipelines using Spark, Kafka, Kafka Streaming, Kudu, Impala, AWS cloud stack and Hadoop Spark jobs.
  • Development of Continuous and streaming Ingestion workloads using Kafka, Flink, Ksql, Spark and Apache Iceberg
  • Design and Development of various data flows from different source systems to various cloud service providers using Python, Airflow and Apache Nifi
  • Design and Development of various Enterprise data patterns, and architectures which includes data lakes, lake house, data fabric, data mesh, and operational dbs.
  • Development of Fast Analytics pipeline using Kudu-Impala, Kudu-Spark, Kudu-Nifi workloads
  • Performance and Query optimizations on Kudu, and setting up Operational monitors on Kudu Metrics
  • Development of multiple MicroServices using AWS Lambda, ApiGateway, ELB, SQS, SNS, Kinesis.
  • Development of Data Analysis pipelines using Glue Jobs, PySpark on EMR cluster.
  • Design and development of Data cleansing, Wrangling and correlate processes using Redis, ElasticCache, Glue, Lambda, S3, Neptune, DynamoDB.
  • Infrastructure and operations automation using Terraforms Modules, Workspaces, and State files.
  • Exploratory Data Analysis on various data points across systems to find various features, relations, and statistical modeling for Machine Learning models.
  • Development of Machine Learning Pipelines using Various Learning Algorithms, Statistical models, Neural Networks, TensorFlow, Scikit-Learn, Python, MLLib, PySpark.
  • Implemented various pipe-in and pipe-out patterns based on AWS best practices and industry standards.
  • Analysis and Design of Various Machine Learning Data Models to Predict Major outages, Truck Rolls, Call volumes and Customer CHURN analysis which are aligned with Industry best standards and Security practices.
  • Dedupe/Record linkage of Customer data using various statistical and machine learning models for enabling customer 360 of teh data from various sources.
  • Troubleshooting and Debugging of Production issues for Various components in AWS Stack includes Lambda, API Gateway, CloudWatch, Neptune, DynamoDB, KMS, IAM, Redis, Kinesis, Glue, PySpark, SageMaker.
  • Code and Maintenance of Application Infrastructure in AWS using Terraforms, CloudFormation, SAML, and serverless frameworks.

Confidential

Big Data Lead/Architect

Responsibilities:

  • Overall Responsibility for System architecture and defining strategy, technical architecture, solution design, implementation and delivery of Confidential projects.
  • Understanding and evaluating business requirements to provide high quality, effecting big data solutions for batch processing, micro batch, real time, near real time, data discovery, and researching purposes.
  • Design and development of data lake and analytical platform using StreamSet, Spark, Impala, and Kudu
  • Configuring and setting up of Kudu Clusters, scaling teh cluster, schema design and troubleshooting Kudo-Spark, Kudu-Impala workloads
  • Microservices based Architecture using Containers and CICD (Continuous Integration and Continuous Development) approach for major module developments.
  • Design and Development of Data ingestion, data cleansing, data processing, data analysis and data export process and following teh industry best practices.
  • Development of Kafka-Spark, Kafka-HDFS, Kafka-Storm based solutions for processing real-time and near-real-time data.
  • Design and Development of AWS Cloud data ingestion and Correlation pipelines using Kinesis, AWS Lambda, DynamoDB, S3, SQS, DynamoDB, EC2 and FarGate.
  • Development of Data Ingestion Layers for importing data from various sources including Kinesis Streams, JMS, Sequence Files, Container Files (Avro, Parquet), CSV, Binary data, text data, Informatica, MySql, Oracle, and Teradata and data enrichment using teh Kafka, Kafka-Connect, Sqoop, HDFS, WebHDFS, Spark Streaming, Spark and Hive.
  • Development of Data Services using teh RESTFul APIs using scala, to provide access to Hadoop Tables (Hive, HBase), and Other Objects (Kafka Topics, Spark Jobs) to teh external systems in secured and controlled access.
  • Writing Spark Applications for processing of teh Data in-memory and data lookup using teh Spark-SQL, and transformations using teh DataStream, RDD, PairRDD, DataFrame APIs.
  • Designed, and developed various ML pipelines using PySpark, TensorFlow, Keras, NumPy, PyTorch to continuously train teh models, and validate teh accuracy
  • Akka based Actor system to build teh on-demand polling process and integrate them with kafka for upstream processing of spark layers.
  • Development of Container based Micro services using Docker, OpenShift, Java, Spring Boot, Spring Batch, Scala, Akka, Kafka.
  • Orchestration of Spark Batch, and Data Ingestion jobs using Oozie, Nifi Processors, Cox recommended Frameworks, and Integration with Informatica, HP BSM, NOC Systems for operational preparation.
  • Responsible for doing Unit Testing, Functional Testing, Integration Testing using teh JUnit, MRUnit, ScalaTest, Scalaspec, Spark-testing-base, and Mockito.
  • Develop Automated Build and Deployment pipelines with CI/CD process Jenkins, GIT, dockers, Jira and Writing Ansibles Scripts
  • Troubleshooting and debugging teh Spark jobs, Kafka Topics, Storm Topologies, Mapreduce Jobs, Java Modules, Scala Modules, Workflows, failover logs, to handle teh production system issues.
  • Hortonworks cluster maintenance includes regular jobs monitoring, logs clearing, service health checks, automation scripts for regular operational monitoring, HBase Performance fine tuning, configuration management, decommissioning nodes, adding new nodes.
  • Building Various Operations Dashboards, Alert and Notification systems for enabling continuous monitoring of Systems and Resolving teh issues in a timely manner to avoid any potential system call outs.

Confidential

Lead Hadoop Consultant

Responsibilities:

  • Responsible for architecting Hadoop clusters and Involved in System Architecture, Requirement Gathering and Analysis, Feasibility Study, Design and Building teh system for UKPN data analytics platform.
  • Designed and Developed teh Real-time Data analytics system for data ingestion and processing of Real-time events captured from teh Smart Meter and Smart Grids.
  • Micro Batching with Spark Streaming connected to Kafka topics, and running Spark program to aggregate, apply filters, and windowed analysis.
  • Design and Development of Custom Spark Modules including Custom RDD, Serializes, Partitioning, Executors for meeting teh various customer requirements.
  • Real-time Data Analysis of Power Outages, and Critical Events using teh Apache Flume, Kafka, Storm topologies and Spark, Hadoop with Cassandra DB backend.
  • Importing and Exporting data from RDBMS and Unstructured data sources like teh internal blogs and customer feedback, and survey pages into HDFS using Sqoop, MapReduce, Spark, Python, Kafka for teh later processing.
  • Extensively used Spark and Scala to develop teh statistical data models and pattern matching.
  • Design and Implementation of Partitioning, Dynamic Partitions, Buckets in Hive.
  • Orchestration of Data ingestion, Process, and Analysis Workflow using teh Oozie workflow engine.
  • Write Python scripts to Extract teh Unstructured (Facebook, Twitter) and Semi Structured data (XML, JSON) and Data Landing in teh Staging Servers.
  • Text processing, and Text Mining using teh Solr, NLP, Elasticsearch, such as name or entity matching, text categorization/routing, named-entity extraction, sentiment analysis, Competitive Promotions and Pricing.
  • Developed and Design of Data Access Layer, and Data Services using teh RESTful API’s to access from teh outside system to connect with teh different applications.
  • Created Statistical data modeling, machine learning and sentiment analysis using R, MLLib and Hadoop.
  • Followed teh best practices in Setting up teh Hadoop System and Integration with teh Other Hadoop Applications to Fine tune teh performance and optimization.

Environment: Hadoop, MapReduce, HDFS, Hive, Storm, NLP, Solr, Spark, Kafka, Cassandra, AWS, R, Oozie, Pig, Sqoop, Oracle 11g/10g, MySql, Python, PL/SQL, Mahout, Java, Eclipse

Confidential, Mooresville-NC

Hadoop Consultant

Responsibilities:

  • Involved in Business Requirement gathering and Requirement analysis.
  • Moved all crawl data files generated from various retailers to HDFS for further processing, Writing teh Apache PIG scripts to process teh HDFS data, Involved in creating Hive Tables, loading teh data and writing Hive Queries dat will run internally in a map reduce way, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing, Developed teh Sqoop Scripts to import and export teh data from/to Relational Databases.
  • Developing scripts and batch jobs to schedule various Hadoop Programs.
  • Created Statistical data models using MapReduce Streaming with Python and R.
  • Extracted feeds from social media sites such as Facebook, Twitter using Python scripts.

Environment: Hadoop, HDFS, HBase, Pig, Hive, MapReduce, Sqoop, Flume, ETL, REST, Java, Python, PL/SQL, Oracle 11g, Unix/Linux, CDH3, CDH4.

Confidential

Software Architect and Data Scientist

Responsibilities:

  • Hadoop cluster setup, batch processing of data ingested from Structured data sources using sqoop, oozie workflow orchestration, Writing Hive queries and Hive UDF to analyze teh data, MapReduce jobs for building teh data pipelines and custom data validations.
  • Uploaded and processed more TEMPthan 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume, Played a key-role in setting up a 40 node Hadoop cluster utilizing Apache Spark by working closely with teh Hadoop Administration team.
  • Played a key role in Developing Spar jobs ingested data from HDFS, and from Solr and processing to compute various metrics out of teh data.

Environment: Apache Hadoop, Hive, HBase, MapReduce, Sqoop, Oozie, LINUX, Eclipse, Git, Maven, Mahout, Sqoop, Zookeeper, MapReduce, neo4j, NodeJS, Java

Confidential

Java Lead Engineer

Responsibilities:

  • Project Planning, Scheduling, Gathering Customer Requirements, Requirement Feasibility Study, Technical Leadership
  • Development of Web Application using Java, J2EE, HTML, XHTML, JSP, Servlet, JSF, JavaBeans, Build Migration from Ant to Maven 2, Debugging and Troubleshooting

Environment: Java 7, Spring framework, Spring Model View Controller (MVC), Struts 2.0, XML, Hibernate 3.0, UML, Java Server Pages (JSP) 2.0, Servlets 3.0, JDBC4.0, JUnit, Log4j, MAVEN, Win 7, HTML, REST Client, Eclipse, Agile Methodology, Design Patterns, WebSphere 6.1.

We'd love your feedback!