We provide IT Staff Augmentation Services!

Senior Big Data Engineer/cloud Engineer Resume

0/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • Around 9 years of IT experience in Software Development with strong work experience as a Big Data Engineer/Cloud Engineer (AWS/Azure)/ Hadoop Developer with solid understanding of Hadoop framework.
  • Experienced in using Agile methodologies including extreme programming, SCRUM and Test - Driven Development (TDD)
  • Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, Map R, Amazon EMR) to fully implement and leverage new Hadoop features.
  • Experience in developingcustomUDFsfor Pig and Hive to in corporate methods and functionality of Python/Java intoPig LatinandHQL(HiveQL) and Used UDFs from Piggybank UDF Repository.
  • Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory
  • Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance
  • Strong experience with ETL and/or orchestration tools (e.g., Talend, Oozie, Airflow)
  • Experience setting up AWS Data Platform - AWS CloudFormation, Development End Points, AWS Glue, EMR and Jupiter/Sage maker Notebooks, Redshift, Dynamo DB, S3, and EC2 instances
  • Strong Experience in Data Migration from RDBMS to Snowflake cloud data warehouse
  • Experience in developing Spark Applications using Spark RDD, Spark - SQL and Data frame APIs.
  • Worked with real-time data processing and streaming techniques using Spark streaming and Kafka.
  • Design, create, revise and manage reports generated from operational and analytical systems using SSRS, Tableau, Power BI, and Crystal Reports
  • Hands on experience in setting up workflow using Apache Airflow and Oozie workflow engine for managing and scheduling Hadoop jobs.
  • Good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Good understanding of Spark Architecture with Databricks, Structured Streaming. Setting Up AWS and Microsoft Azure with Databricks, Databricks Workspace for Business Analytics, Manage Clusters in Databricks, Managing the Machine Learning Lifecycle
  • Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing Partitioning and Bucketing, developing and tuning the HQL queries.
  • Replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing.
  • Database design, modelling, migration and development experience in using stored procedures, triggers, cursor, constraints and functions. Used My SQL, MS SQL Server, DB2, and Oracle
  • Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.
  • Experience with Software development tools such as JIRA, Play, GIT.
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
  • Good understanding of the Data modelling (Dimensional & Relational) concepts like Star-Schema Modelling, a Schema Modelling, Fact and Dimension tables.
  • Experience in writing complex SQL queries, creating reports and dashboards.
  • Proficient in using Unix based Command Line Interface.

TECHNICAL SKILLS

Big Data Technologies: HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distributionand HBase, Spark, Spark Streaming, Yarn, Zookeeper, Kafka, AirflowETL (Nifi, Talend etc.)

Languages: SQL, Python, Java, R, Scala, Terraform, XML, Shell/Unix, Perl

RDBMS: MySQL, MS-SQL Server, Oracle 10g/11g/12c, DB2

NO SQL: Cassandra, HBASE, MongoDB

Data Warehouse: Redshift, Snowflake

CI-CD/ DevOps: Jenkins, Docker, Kubernetes, Ant, Maven, Gradle

Operating Systems: Linux, Windows XP/7/8/10/11, Mac.

Software Life Cycle: SDLC, Waterfall and Agile models.

Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SVN, Log4j,SOAP UIAlteryx, Visio, Jira, IntelliJ.

Data Visualization Tolls: Tableau, SSRS, Power BI, MS Access

Cloud Services: AWS (EC2, S3, EMR, RDS, Lambda, CloudWatch, Auto scaling, RedshiftCloud Formation, Glue, Dynamo DB etc.) Azure (Databricks, Data FactoryData Storage, Data Lake, Azure SQL)

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Senior Big Data Engineer/Cloud Engineer

Responsibilities:

  • Built real time pipeline for streaming data usingKafkaandSparkStreaming.
  • Create and maintain optimal data pipeline architecture in cloudMicrosoft Azure using Data Factory and Azure Databricks
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
  • Data migrated from Oracle and SAS to Hive and Azure Data Lake.
  • Developed several complex SQL Scripts
  • Automated resulting scripts and workflow usingApache Airflowandshell scriptingto ensure daily execution in production.
  • Used Talend to load data from various sources into Hadoop Ecosystem
  • Created Dax Queries to generated computed columns in Power BI.
  • Works on loading data into Snowflake DB in the cloud from various sources.
  • Wrote production level Machine Learning classification models and ensemble classification models from scratch using Python and PySpark to predict binary values for certain attributes in certain time frame.
  • Performed all necessary day-to-day GIT support for different projects, Responsible for design and maintenance of the GIT Repositories, and the access control strategies.
  • Performed end-to-end delivery of pyspark ETL pipelines on Azure-data bricks to perform the transformation of data orchestrated via Azure Data Factory (ADF) scheduled through Azure automation accounts and trigger them using Tidal Schedular.
  • Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Knowledge of USQL
  • Used Nifi to build data flow feed data from local files / logs to HDFS
  • Used Jenkins and pipelines which helped us drive all Microservices builds out to the Docker registry and then deployed to Kubernetes.
  • Migration of on-premises data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1/V2).
  • Integrated and automated data workloads to Snowflake Warehouse.
  • Responsible for wide-ranging data ingestion using Sqoop and HDFS commands. Accumulate ‘partitioned’ data in various storage formats like text, Json, Parquet, etc.
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
  • Develop NiFiworkflow to pick up the multiple files from ftp location and move those to HDFS on daily basis.
  • Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines
  • Ensure ETL/ELTs succeeded and loaded data successfully in Snowflake DB.
  • Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using commands with Crontab.
  • Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
  • Developed thefeatures,scenarios,step definitionsforBDD (Behaviour Driven Development)andTDD (Test Driven Development)usingCucumber, Gherkinandruby.
  • Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
  • Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.
  • Recreated existing SQL Server objects in snowflake.
  • Writing pyspark and spark sql transformation in Azure Databricks to perform complex transformations for business rule implementation
  • Used Power BI, Power Pivot to develop data analysis prototype, and used Power View and Power Map to visualize reports
  • Used ApacheSpark Data frames, Spark-SQL, Spark MLLibextensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.

Environment: Spark-Streaming, Hive, Scala, Hadoop, Azure, Data Bricks, Data Lake, Data Factory, Data Storage, Azure SQL, Kafka, Airflow, Oozie, Spark, Sqoop, Docker, Spark SQL, TDD, pig, ETL/ELT, Power BI, Talend, Impala, Oozie, Hbase, Nifi, Zookeeper, Snowflake, Unix/Linux Shell Scripting, Python, PyCharm, CI/CD, Jenkins, Docker, Kubernetes, Microservices, Linux, Shell Scripting, Git

Confidential, Austin, TX

Senior Big Data Engineer

Responsibilities:

  • Brought data from various sources into Hadoop and Cassandra usingKafka.
  • Supported Map Reduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS.
  • Experience in developing Spark applications using Spark-SQL inDatabricksfor data extraction, transformation, and aggregation from multiple file formats for Analysing& transforming the data to uncover insights into the customer usage patterns.
  • Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
  • Implemented Partitioning, Dynamic Partitions and Buckets inHIVEfor efficient data access.
  • Worked on Spark Streaming using Kafka to submit the job and start the job working in Live manner.
  • DesignedAWSCloud Formation templates to create VPC, subnets, NAT to ensure successful deployment of Web applications and database templates.
  • Loading data from different source (database & files) into Hive usingTalendtool.
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Used Talend big data components like Hadoop and S3 Buckets and AWS Services for redshift.
  • Involved in data migration to snowflake using AWS S3 buckets.
  • Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup onAWS.
  • Exported the Analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team UsingTableau
  • Install and configureApache Airflowfor S3 bucket and Snowflake data warehouse and createddagsto run the Airflow.
  • Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using Py Spark.
  • Developed spark job to consume data from Kafka topic and perform validations on the data before pushing data into HBase and Oracle databases.
  • Built different visualizations and reports in tableau using Snowflake data.
  • Selected and generated data into csv files and stored them into AWS S3 by using AWS EC2 and then structured and stored in AWS Redshift.
  • Worked on Configuring Zookeeper, Kafka and log stash cluster for data ingestion and Elastic search performance and optimization and Worked on Kafka for live streaming of data
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark databricks cluster
  • Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
  • Unit tested the data between Redshift and Snowflake.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Generated report on predictive analytics using Python and Tableau including visualizing model performance and prediction results.
  • Used Jenkins pipelines to drive all micro services builds out to the Docker registry and then deployed to Kubernetes, Created Pods and managed using Kubernetes.
  • Validation of Looker report with Redshift database.
  • Developed code to handle exceptions and push the code into exception Kafka topic.
  • Worked on designing ETL pipelines to retrieve the dataset from MySQL and MongoDB into AWS S3bucket, managed bucket and objects access permission
  • Architect and design server less application CI/CD by using AWS Server less (Lambda) application model.
  • Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances.

Environment: Hadoop, Kafka, Spark, Spark Databricks, Sqoop, ETL/ELT, Talend, Airflow, Oozie, AWS (Glue, Lambda, Step Functions, SQS, Code Build, Code Pipeline, Event Bridge, Athena, Redshift, Dynamo DB), Talend, Tableau, Spark SQL, Spark-Streaming, ETL/ELT, Hive, Scala, pig, Impala, Oozie, Nifi, Hbase, Zookeeper, CI/CD, Jenkins, Kubernetes, Docker, Micro services, Python, Snowflake, Unix

Confidential, Houston, TX

Big Data Engineer

Responsibilities:

  • Migrate databases to cloud platform SQL Azure and as well the performance tuning.
  • Installed and Configured Sqoop to import and export the data into Hive from Relational databases.
  • Expertise in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python.
  • Involved in creating HiveQL on HBase tables and importing efficient work order data into Hive tables
  • Extensive experience on Hadoopecosystem components likeHadoop, Map Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper and Flume.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in inAzure Databricks.
  • Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines
  • Experienced in ETL concepts, building ETL solutions and Data modelling
  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and Scala for data cleaning and pre-processing
  • Extensive usage of Azure Portal, Azure PowerShell, Storage Accounts, Certificates and Azure Data Management.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Worked on architecting the ETL transformation layers and writing spark jobs to do the processing.
  • Configured Zookeeper, worked on Hadoop High Availability with Zookeeper failover controller, add support for scalable, fault-tolerant data solution.
  • Experience in Developing ETL solutions using Spark SQL in Azure Databricks for data extraction, transformation and aggregation from multiple file formats and data sources for analysing & transforming the data to uncover insights into the customer usage patterns
  • Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines
  • Used HBase for storing the Kafka topic, partition number and Offsets value. Also used phoenix jar to connect HBase table.
  • Worked as L1 support on Jira requests for Kafka.
  • Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Using HBase to store majority of data which needs to be divided based on region.
  • Designed Oozie workflows for job scheduling and batch processing.
  • Writing Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • CreateSelf Servicereportingin Azure Data Lake Store Gen2using an ELT approach.
  • Experience in writing SQOOP Scripts for importing and exporting data from RDBMS to HDFS.
  • Implemented python codebase for branch management over Kafka features.
  • Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with Python
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Built performant, scalable ETL processes to load, cleanse and validate data

Environment: Hadoop, HDFS, Hive, Python, Azure, Data Bricks, Data Lake, Data Storage, Data Factory, ETL/ELT, Airflow, Kafka, Map reduce, Scala, spark, Hbase, pig, zookeeper, Sqoop, Flume, Oozie

Confidential, Redwood City, CA

Data Engineer

Responsibilities:

  • Implementing and Managing ETL solutions and automating operational processes.
  • Defined and deployed monitoring, metrics, and logging systems on AWS.
  • Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.
  • Installed and configuredHadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java and Nifi for data cleaning and pre-processing.
  • Worked on Big data on AWS cloud services i.e. EC2, S3, EMR and DynamoDB
  • Migrated on premise database structure to Redshift data warehouse
  • Developed parameter and dimension based reports, drill-down reports, matrix reports, charts, and Tabular reports using Tableau Desktop.
  • Was responsible for ETL and data validation using SQL Server Integration Services.
  • Measured Efficiency of Hadoop/Hive environment ensuring SLA is met
  • Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift
  • Optimized the Tensor Flow Model for efficiency and Spark SQL for ETL jobs and using the right technology for the job to get done.
  • Strong understanding of AWS components such as EC2 and S3
  • Created Tableau scorecards, dashboards, Heat maps using show me functionality.
  • Used JSON schema to define table and column mapping from S3 data to Redshift
  • DevelopedKafka consumer APIin Scala for consuming data from Kafka topics.
  • Created ad hoc queries and reports to support business decisions SQL Server Reporting Services (SSRS).
  • Analyze the existing application programs and tune SQL queries using execution plan, query analyzer, SQL Profiler and database engine tuning advisor to enhance performance.
  • Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.
  • Developed SSRS reports, SSIS packages to Extract, Transform and Load data from various source systems

Environment: AWS,EC2, S3, Lambda, Redshift, Nifi, SQL Server, Erwin, Kafka, Spark, Pig, Hive, HDFS, Oracle, Informatica, RDS, MySQL, Docker, PostgreSQL, Tableau, Git Hub

Confidential

Hadoop Developer

Responsibilities:

  • Importing and exporting data into HDFS and Hive usingSqoop.
  • Hive Context, with transformations and actions (map, flat Map, filter, reduce, reduce by Key).
  • DevelopedPIGscripts for the analysis of semi structured data.
  • Migrated ETL jobs toPig scriptsto do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Imported data fromAWS S3 intoSpark RDD, Performed transformations and actions onRDDs.
  • Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.
  • DevelopedPIG UDF'Sfor manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • WrittenHivejobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Performed tuning of SQL queries and Stored Procedure for speedy extraction of data to resolve and troubleshoot issues in OLTP environment.
  • Experienced in writing liveReal-time Processingand core jobs usingSpark Streamingwith Kafka as aData pipe-line system.
  • Created group and users in tableau server.
  • Worked on migratingMapReduceprograms into Spark transformations usingSparkandScala, initially done usingpython(PySpark).
  • Worked withELASTIC MAPREDUCEand setupHadoop environmentinAWS EC2Instances.
  • DevelopedSparkjobs usingScalaon top ofYarn/MRv2for interactive and Batch Analysis.
  • Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to hive and impala.
  • Worked onOozieworkflow engine for job scheduling.
  • Experience in setting up the wholeapp stack, setup anddebug log stashto sendApache logstoAWSElastic search.

Environment: Hadoop, Hive, HDFS, Spark, Oozie, Map Reduce, Scala, Python, Pyspark, AWS, Oracle 10g, SQL, OLTP, Tableau, Windows, MS Office

Confidential

Junior Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple MapReduce jobs in java and Scala for data cleaning and preprocessing.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Experienced in running query-usingImpalaand used BI tools to run adhoc queries directly on Hadoop.
  • Used existing Deal Model in Python to inherit and create object data structure for regulatory reporting.
  • Experienced in installing, configuring and using Hadoop Ecosystem components.
  • Installed and configured Hive and writtenHive UDFsand Used Map Reduce and Junit for unit testing.
  • UsedDataStax Cassandraalong with Pentaho for reporting.
  • Queried and analyzed data fromDataStaxCassandrafor quick searching, sorting and grouping.
  • UsedYarn Architecture and MapReduce in the development cluster for POC.
  • Supported MapReduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS.

Environment: CDH, MapReduce, HDFS, Hive, pig, Impala, SQL, Tableau, PIG, Teradata, CentOS.

We'd love your feedback!