We provide IT Staff Augmentation Services!

Sr. Data/cloud Engineer Resume

Atlanta, GA

SUMMARY

  • Over 8 years of experience in Data Engineering,Data Pipeline Design,Development and Implementation as a Sr.Data Engineer/Data Developer and Data Modeler.
  • Strong experience in Software Development Life Cycle (SDLC) includingRequirements Analysis, Design Specification and Testing as per Cycle in bothWaterfall and Agile methodologies.
  • Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing thedata.
  • Experience in ingestion, storage, querying, processing, and analysis of BigData with hands - on experience in BigData including Apache Spark, Spark SQL, Hive.
  • Hands-on experience with Spark, AWS, Azure,Talend, Hadoop, HDFS, Hive, Impala, Oozie, Sqoop, HBase, Scala, Python, Kafka, Kudu, NoSQL Databases like Cassandra, Mongo DB.
  • Extensively used Python Libraries PySpark, Pytest, Pymongo, cxOracle, PyExcel,Boto3, Psycopg, embedPy, NumPy and Beautiful Soup.
  • Experience in Google Cloud components, Google container builders and GCP client libraries and cloud SDK'.
  • Experience in AWSEC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Expertise in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python.
  • Extensive experience in performing ETL on structured and semi-structureddatausing AWS GLUE,TALENDDI and Bigdatabatch jobs.
  • Experience in working with SQL/ NoSQL like MongoDB, Cassandra, MySQL, andPostgreSQL.
  • Expert in developing SSIS/DTS Packages to extract, transform and load (ETL)dataintodatawarehouse/datamarts from heterogeneous sources.
  • Experienced in building highly reliable, scalable Big-datasolutions on both on premises and cloud Hadoop distributions like Cloudera, AWSEMR.
  • Experience inDataAnalysis,DataProfiling,DataIntegration, Migration,Datagovernance and Metadata Management, MasterDataManagement and ConfigurationManagement.
  • Experience in usingDatabricks for handling all analytical process from ETL to alldatamodeling by leveraging familiar tools, languages, and skills, via interactive notebooks or APIs.
  • Experience with POC that involves scripting using PySpark in AzureDatabricks.
  • Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.
  • Experience in designing star schema, Snowflake schema forDataWarehouse, ODS architecture.
  • Experience in using and writing SQL queries, database creation, and writing stored procedures, DDL, DML SQL queries.Good experience on Design, configure and manage the backup and disaster recovery for Hadoopdata.
  • Experience optimizing ETL workflows using AWS Redshift.
  • Experience developing Pig Latin and HiveQL scripts forDataAnalysis and ETL purposes and extended the default functionality by writing User Defined Functions(UDFs), User Defined Aggregate Function (UDAFs) for customdataspecific processing.
  • Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SQS and other AWS services.
  • Good Knowledge in Amazon AWS computing like EC2, RDS, GLUE, AWSLambda, Step Functions, Kinesis, Sage maker and Dynamo DB.
  • Experience on developing REST Services.
  • Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
  • Experience configuring and working on AWS EMR Instances.

PROFESSIONAL EXPERIENCE

Sr. Data/Cloud Engineer

Confidential, Atlanta, GA

Responsibilities:

  • DevelopedTalendjobs to populate the claims data to datawarehouse - star schema, snowflake schema, Hybrid Schema.
  • Partnered with ETL developers to ensure thatdatais well cleaned and thedatawarehouse is up-to-date for reporting purpose by Pig.
  • Designed, developed spark scripts for parsing the JSON files and storing inParquet file format inEMR.
  • Legacy Informatica batch/ real time ETL logical code migrated into Hadoop usingPython, Spark Context, Spark-SQL,DataFrames and Pair RDD’s inDataBricks.
  • Develop real-timedatafeeds and microservices leveraging AWS Kinesis,Lambda,Kafka, Spark Streaming, etc. to enhance useful analytic opportunities and influence customer content and experience.
  • Designed PySpark scripts and Airflow DAGS to transform and load thedatafromHIVE tables from AWS S3.
  • Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
  • Worked on Ingestingdataby going through cleansing and transformations and leveraging AWSLambda, AWS Glue and Step Functions.
  • Worked on designing and developing ETL solutions for complexdataingestion requirements usingTalendCloud Real Time BigDataPlatform, Informatica PowerCenter, Informatica Intelligent Cloud Services, Python, PySpark and implementeddatastreaming using Informatica Power Exchange.
  • Using PySpark, created ETL jobs to assist in AWS ETL process.
  • Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Build large-scaledataprocessing systems indatawarehousing solutions, and work with unstructureddatamining on NoSQL.
  • Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, MongoDB, and SQL Server usingPython.
  • Utilized Spark SQL API in PySpark to extract and loaddataand perform SQL queries.
  • Currently working as an Architect on a new cloud implementation, responsible for building solutions using Azure DevOps for migration of their disparate RDBMS data sources using AWS DMS into Snowflake.
  • Selected and generateddatainto CSV files and stored them into AWS S3 by using AWSEC2and then structured and stored in AWS Redshift.
  • PerformedDataquality issue analysis using Snow SQL by building analytical warehouse onSnowflake.
  • Used Docker for building and testing the containers locally and deploy to AWSECS.Developed multiple POC's using, Scala and deployed on the YARN Cluster, compared the performance of Spark, with Hive and SQL.
  • Installed, ConfiguredTalendETL on single and multi-server environments.
  • Used managed spark platform by Databricks on AWS to quickly create cluster on demand to process large amounts ofdatausing PySpark.
  • Developed and designed ETL Jobs usingTalendIntegration Suite inTalend5.2.2
  • Optimize the Pyspark jobs to run on Kubernetes Cluster for fasterdataprocessing.
  • Design, develop and maintain non-production and production transformations in AWS environment and create thedatapipelines using PySpark Programming.
  • Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier,Created Volumes and configured Snapshots for EC2 instances.
  • Designed, developed, and implemented ETL pipelines using python API (PySpark) ofApache Spark on AWS EMR.
  • Used Python for SQL/CRUD operations in DB, file extraction/transformation/generation.
  • Writing Pig Scripts to generate MapReduce jobs and performed ETL procedures on thedatain sql HDFS.
  • Utilized Agile and Scrum methodology for team and project management.
  • Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, MongoDB, T-SQL, and SQL Server usingPython.

Environment: Python, AWS, Jira, GIT, CI/CD, Docker, PySpark, Kubernetes, Talend, EC2, ECS, Lambda, Glue, Redshift, EMR, Web Services,Spark streaming API, Kafka, Cassandra, Python, MongoDB, JSON, Bash scripting,HIVE, Linux, Rest API, SQL, Apache Airflow.

Data Engineer

Confidential, Houston, TX

Responsibilities:

  • Worked on AWSDatapipeline to configuredataloads from S3 to into Redshift.
  • Developed matdatapipelines using Spark and PySpark.
  • Provide asynchronous replication, including AmazonEC2and RDS, for regional MySQL database deployments and fault tolerant servers (with solutions tailored for managing RDS).
  • Created aLambdaDeployment function and configured it to receive events from S3 buckets.
  • Analyzed SQL scripts and designed the solutions to implement using PySpark.Using AWS Redshift, I Extracted, transformed and loadeddatafrom various heterogeneousdatasources and destinations.
  • Created Tables, Stored Procedures, and extracteddatausing T-SQL for business users whenever required.
  • Created Active Batch jobs to automate the Pyspark and SQL functions as daily run jobs.
  • Like Access, Excel, CSV, Oracle, flat files using connectors, tasks and transformations provided by AWSDataPipeline.
  • Utilized Spark SQL API in PySpark to extract and loaddataand perform SQL queries.
  • Ability to work effectively in cross-functional team environments, excellent communication, and interpersonal skills.
  • Created ETL Mapping withTalendIntegration Suite to pulldatafrom Source, apply transformations, and loaddatainto target database.
  • Responsible for Design, Development, and testing of the database and DevelopedStored Procedures,
  • Views, and Triggers.
  • Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
  • Managed Amazon Web Services likeEC2, S3 bucket, ELB, Auto-Scaling, Dynamo DB, Elastic search.
  • Shifting Cultivation an API to manage servers and run code in AWS using AmazonLambda.
  • Developed Python-based API (RESTful Web Service) to track revenue and perform revenue analysis.
  • Created ETL Mapping withTalendIntegration Suite to pulldatafrom Source, apply transformations, and loaddatainto target database.
  • Performing ETL testing activities like running the Jobs, Extracting thedatausing necessary queries from database transform, and upload into theDatawarehouse servers.
  • Extract Transform and Loaddatafrom Sources Systems to AzureDataStorage services using a combination of AzureDataFactory, T-SQL, Spark SQL, and U-SQLAzureDataLake Analytics.
  • Provide guidance to development team working on PySpark as ETL platform
  • Primarily involved inDataMigration using SQL, SQL Azure, Azure Storage, andAzureDataFactory, SSIS, PowerShell.
  • Involved in requirement and design phase to implement StreamingLambdaArchitecture to use real time streaming using Spark and Kafka.
  • Worked on reading and writing multipledataformats like JSON, ORC, Parquet onHDFS using PySpark.
  • MIGRATION Design, develop, and test dimensionaldatamodels using Star and Snowflake schema methodologies under the Kimball method.
  • Written programs in Spark using Python, PySpark and Pandas packages for performance tuning, optimization anddataquality validations.
  • Extracted files throughTalendETL jobs and placed in HDFS location.
  • Developeddatapipeline using Spark, Hive, Pig, python, Impala, and HBase to ingest customer.
  • Used AWS services likeEC2and S3 for smalldatasets.
  • Involved in converting Hive/SQL queries into Spark transformations using SparkRDDs, Python and Scala.
  • Managed ETL jobs developed inTalendETL.
  • Created an automated event driven notification service utilizing SNS, SQS,Lambda, and Cloud Watch.
  • High verbal and written communication skills.
  • DataExtraction, aggregations, and consolidation of Adobedatawithin AWS Glue using PySpark.
  • Worked on development ofdataingestion pipelines using ETL tool,Talend& bash scripting with bigdatatechnologies including but not limited to Hive, Impala, Spark, Kafka, andTalend.
  • Designed SSIS Packages to extract, transfer, load (ETL) existingdatainto SQLServer from different environments for the SSAS cubes (OLAP).
  • Design and Develop ETL Processes in AWS Glue to migrate Campaigndatafrom external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Develop ETL programs and job workflows to transferdatato AWS Redshift.

Environment: Scala, Python, PySpark, Spark, Spark ML Lib, Spark SQL, HIVE, EC2, ECs. Lambda, EMR, Glue, Redshift, Python, ETL, ELT, SQL, HDFS, Talend, TensorFlow,NumPy, Keras, PowerBI.

Data Engineer

Confidential, King of Prussia, PA

Responsibilities:

  • Implemented spark transformations and ingestion into ourdatalake.
  • Hands on expedited in implementation and designing EMR clusters in nonprofit and prod region based ondataingestion sizes.
  • Design and implement control m jobs for completing ingestion process.
  • Design and implemented oozie jobs to run the ingestion.
  • Designed ingestion cluster to perform ondatatransformation.
  • Ability to design complex ETL jobs usingTalendStudio for bigdataapplications.
  • Configured spark jobs for quick ingestion and added enough resources to handled10TBdataon daily basis.
  • Responsible for Account management, IAM Management and Cost management.
  • Designed AWS Cloud Formation templates to create VPC, subnets, NAT to ensure successful deployment of Web applications and database templates.
  • Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.
  • Created RDD’s in Spark technology.
  • Extractingdatafromdatawarehouse (Teradata) on to the Spark RDD’s.
  • Implemented build and deploy plans from scratch.

Environment: Hadoop, Spark, Scala, Teradata, Hive, Aorta, Sqoop, GCP, Google cloud storage, Big Query, Dataflow, SQL, DB2, UDP, GitHub, Azure (AzureDataFactory, Azure Databases), Tableau

SQL Developer

Confidential, NYC, NY

Responsibilities:

  • Design, develop, test, implement and support ofDataWarehousing ETL usingTalend.
  • Involved in requirement gathering with different business teams to understand their analytical/reporting needs.
  • Redesigned and improved existing procedures, triggers, UDF, and views with execution plans, SQL profiler, and DTA.
  • Worked with applications systems analysis and programming activities.
  • Involved in writing SQL scripts, stored procedures and functions and debugging them and Scheduling SQL workflows to automate the update processes.
  • Defined and developed logical/physical dimensional models of thedatamart withER-win.
  • Wrote T-SQL stored procedures and complex SQL queries to acquire the business logic.
  • Created SSIS packages to extractdatafrom OLTP databases; consolidated and populateddatainto the OLAP database.
  • Utilized various transformations such as lookup, derived column, conditional split, fuzzy lookup fordatastandardization.
  • Debugged SSIS packages with error log details, checkpoints, breakpoints, anddataviewers.
  • Scheduled and maintain packages by daily, weekly, and monthly using SQL ServerAgent in SSMS.
  • Created SSRS reports to produce in-depth business analyses in SSRS.
  • Created analytical dashboards to assist business users with critical KPIs and facilitating strategic planning with Tableau.

Environment: Microsoft SQL Server 2012, SSDT, SSMS, T-SQL, SSIS, SSRS, SSAS,ER-Win, SharePoint, Power BI, TFS, DTA, Tableau, C#.Net, Visual Studio, CSS, HTML.

Hire Now