We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

4.00/5 (Submit Your Rating)

Charlotte, NC

PROFESSIONAL SUMMARY:

  • Motivated Data Engineer having around 8 years of professional experience in Data Engineering, Analytics, Data Modeling, Data Science, Data Architecture, Programming Analysis and Database Design of OLTP and OLAP systems with sound knowledge in Cloud Technologies (AWS, Azure).
  • Highly dedicated, inspiring and expert Sr Data Engineer with over 8 Plus years of IT industry experience exploring various technologies, tools and databases likeBig Data, AWS, S3, Snowflake, Hadoop, Hive, Spark, python, Sqoop, CDL(Cassandra), Teradata, E - R (Confidential), Tableau, SQL, PLSQL, Abinitio (ACE),and Redshift but always making sure of living in the world I cherish most i.e. DATA WORLD.
  • Familiar withJSONbasedRESTWeb services andAmazon Web Services(AWS).
  • Strong experience in designing and working withMySQLandMongo DB.
  • Over 8 years of overall IT experience in a variety of industries, which includes hands on experience in Big Data and Data ware house ETL technologies.
  • Have 4+ years of comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark).
  • Experiencing in Designing and developing a new Redshift data warehouse.
  • Good knowledge on AWS key services like Amazon Redshift, Amazon S3, DMS, Athena, Glue, Kinesis, EMR, SNS, Amazon EC2, Data Pipeline, Amazon Lambda, Amazon CloudWatch and Amazon Glacier.
  • Experience inEC2in setting up instances, worked closely with infrastructure teams to troubleshoot complex issues and setting up security groups.
  • Responsible for testing APIs in Postman and provide documentation to QA team.
  • Good working knowledge on Snowflake and Teradata databases.
  • Good working experience on Spark (spark streaming, spark SQL), Scala and Kafka.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.
  • Excellent Programming skills at a higher level of abstraction using Scala and Python.
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
  • Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
  • Experience in setting up CID pipeline integrating various tools with Jenkins to build and run Terraform jobs to create infrastructure in AWS.
  • Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
  • Designed various Jenkins jobs to continuously integrate the processes and executed CI/CD pipeline using Jenkins.
  • Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
  • Experienced in working with in-memory processing framework like Spark Transformations, Spark QL, MLib and Spark Streaming.
  • Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Experienced in implementing POC using Spark Sql and Mlib libraries.
  • Improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Hands on experience in handling Hive tables using Spark SQL.
  • Efficient in writing MapReduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
  • Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
  • Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.

PROFESSIONAL EXPERIENCE:

Sr Data Engineer

Confidential, Charlotte, NC

Responsibilities:

  • Responsible for sessions with business, project manager, Business Analyst, and other key people to understand the business needs and propose a solution from a Warehouse standpoint.
  • Set up Azure Repo and Pipelines for CI/CD deployment of objects.
  • Experience in Developing Spark applications using Spark/PySpark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage, consumption patterns, and behavior.
  • Have written python DAGs in airflow which orchestrate end to end data pipelines for multiple applications vertica BI.
  • Strong programming skills in designing and implementation of multi-tier applications using web-based technologies like Spring MVC andSpring Boot.
  • Solid experience in Extraction, Transformation and Loading (ETL) mechanism using Ab Initio. Knowledge of full life cycle development for building a data warehouse.
  • Was involved in setting up of apache airflow service in GCP.
  • Using rest API with Python to ingest Data from and some other site to BIGQUERY.
  • Design star schema in Big Query.
  • Monitoring Big query, Data proc and cloud Data flow jobs via Stack driver for all the environment.
  • Build a program with Python and apache beam and execute it in cloud Dataflow to run Data validation between raw source file and BigQuery tables.
  • Excellent Experience in Designing, Developing, Documenting, Testing of ETL jobs and mappings in Server and Parallel jobs using DataStage to populate tables in Data Warehouse and Data marts.
  • Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
  • Creating Databricks notebooks using SQL, Python and automated notebooks using jobs.
  • Creating Spark clusters and configuring high concurrency clusters using Azure Databricks to speed up the preparation of high-quality data.
  • Working Experience on Azure Data brickscloud to organize the data into notebooks and making it easy to visualize data through the use of dashboards.
  • Developing, analyzing and implementing experience in data warehousing, EL (Informatica) and business intelligence tool (Tableau) across various industry sectors to support data related tasks such as integration, quality, profiling, analysis, reporting.
  • Kafka integration with Spark using Spark Streaming APl.
  • Worked on Scala for implementing spark machine learning libraries and spark streaming.
  • Skilled dimensional modeling, forecasting using large-scale datasets (Star schema, Snowflake schema), transactional modeling, and SCD (Slowly changing dimension).
  • In-depth knowledge of Snowflake Database, Schema and Table structures.
  • Define virtual warehouse sizing for Snowflake for different type of workloads.
  • Develop SQL queries Snow SQL
  • Experience in using Snowflake Clone and Time Travel.
  • Developed data warehouse model in snowflake for over 100 datasets using where Scape
  • Creating Reports in Looker based on Snowflake Connections
  • Developed scripts to transfer data from FTP server to the ingestion layer using Azure CLI commands.
  • Created Azure HD Insights cluster using PowerShell scripts to automate the process.
  • Used stored procedure, lookup, execute the pipeline, data flow, copy data, Azure function features in ADF.
  • Used Azure Data Lake storage gen2 to store excel files, parquet files and retrieve user data using Blob API.
  • Worked on Azure data bricks, PySpark, Spark SQL, Azure ADW, and Hive used to load and transform data.
  • Used Azure Data Lake as Source and pulled data using Azure Polybase.
  • Azure data lake, Azure Blob used for storage and performed analytics in Azure Synapse Analytics.
  • Experience inAzure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure HDInsightBig Data Technologies (Hadoop and Apache Spark) andData bricks.
  • Experience in designingAzure Cloud Architectureand Implementation plans for hosting complex application workloads on MS Azure.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Managed the data lakes data movements involving Hadoop, NO-SQL databases like HBase, Cassandra.
  • Experience in working with Amazon EMR, Amazon Glue,Databricksand Cloudera (CDH5) Hadoop Distributions.
  • Performed Data Analysis using SQL, PL/SQL, Python, Spark,Databricks, Teradata SQL Assistant, SQL server management studio, SAS.
  • Written multiple MapReduce programs for data extraction, transformation, and aggregation from numerous file formats, including XML, JSON, CSV & other compressed file formats.
  • We have developed automated processes for flattening the upstream data from Cassandra, which in JSON format. Used Hive UDFs to flatten the JSON Data.
  • I have worked on Data loading into Hive for Data Ingestion history and Data content summary.
  • Power BI reports.

Technologies: Python, Power BI, PL/SQL, Python, Azure-Data factory, Azure Blob storage, Azure table storage, Azure SQL server, Apache Hive, Apache Spark, MDM, Netezza, Teradata, Oracle, Snowflake, Spark, Kafka, MongoDB, Hadoop, Linux Command Line, Data structures, PySpark, Oozie, HDFS, MapReduce, Cloudera, HBase, Hive, Pig, Docker, Tableau.

Senior Data Engineer

Confidential, Charlotte, NC

Responsibilities:

  • Experienced with Cloud Service Providers such as AWS.
  • Developed security policies and processes. Developed views and templates with Python and Django's view controller and templating language to create a user-friendly Website interface.
  • Designed and Developed SQL database structure with Django Framework using agile methodology. Developed project using Django, Oracle SQL, Angular, JavaScript, HTML5, CSS3 and Bootstrap.
  • Involved in the complete Software Development Life Cycle including gathering Requirements, Analysis, Design, Implementation, Testing and Maintenance.
  • Setup and build AWS infrastructure various resources, VPC EC2, S3, IAM, EBS, Security Group, Auto Scaling, and RDS in Cloud Formation JSON templates.
  • Implemented user interface guidelines and standards throughout the development and maintenance of the Website using the HTML, CSS, JavaScript, jQuery and Angular Js.
  • Used Python programming and Django for the backend development, Bootstrap and Angular for frontend connectivity and MongoDB for database.
  • Implemented a CI/ CD pipeline with Docker, Jenkins and GitHub by virtualizing the servers using Docker for the Dev and Test environments by achieving needs through configuring automation using Containerization.
  • Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes and Experienced with Agile methodology and delivery tool VersionOne.
  • Involved in design ofMLweb applications to implement business logic.
  • Experienced with AWS cloud platform and its features, which includes EC2, S3, ROUTE53 VPC, EBS, AMI, SNS, RDS AND CLOUD WATCH.
  • Used the AWS -CLI to suspend on Aws Lambda function used AWS CLI to automate backup of ephemeral data stores to S3 buckets EBS.
  • Very Good experience working in Azure Cloud, Azure DevOps, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NOSQL DB,Azure HDInsight.
  • Experience in Text mining, Topic modeling, Natural Language Processing (NLP), Content Classification, Sentiment analysis, Market Basket Analysis, Recommendation systems, Entity recognition etc.
  • Strong development skills with Azure Data Factory, Azure SQL Data warehouse, Azure SQL Database, Azure Storage Explorer,Azure Synapse Analytics.
  • Experience in designing Azure Cloud Architecture and Implementation plans for hosting complex application workloads on MS Azure.
  • Gathered Semi structured data from S3 and relational structured data from RDS and keeping data sets into centralized metadata Catalog using AWS GLUE and extract the datasets and load them into Kinesis streams.
  • Worked as part of an Agile/Scrum based development team and exposed to TDD approach in developing applications.
  • Worked on designing and deploying a multitude application utilizing almost all the main services of the AWS stack (like EC2, S3, RDS, VPC, IAM, ELB, EMR Cloud watch, Route 53, Lambda and Cloud Formation) focused on high availability, fault tolerance environment.
  • Introduced new features and solved existing bugs by developing code for a cloud-based integration platform (iPaaS) and Migrated customer data from legacy iPaaS to AWS.
  • Deployed and tested different modules in Docker containers and GIT. Implemented programming automations using Jenkins and Ansible on Unix/Linux based OS over cloud like Docker.
  • AWS Kinesis Streams, AWS Step Functions (Serverless) Pipelines, AWS Kinesis Streams, Google TensorFlow, AWS Step Functions (Serverless) Pipelines, AWS Kinesis Streams, AWS Kinesis Streams Data Analytics Streaming SQL (AWS EKS) Pipelines.
  • Work as developer and support engineer where CA API Gateway expose API (Rest/SOAP) services of Home Depot to outside vendors.
  • Worked with different components of iPaaS solution Azure provides, Service Bus, Functions and Logic Apps to use connectors and create workflows.
  • Installed MongoDB, configured, setup backup, recovery, upgrade and tuning and data integrity. Responsible for managing MongoDB environment with high availability, performance and scalability perspectives. Extensive experience in deploying, managing and developing MongoDB cluster.
  • Extensive experience automating the build and deployment of scalable projects through Gitlab CI/ CD, Jenkins, etc. and Worked on Docker and Ansible. Used JavaScript’s for data validations and designed validations modules.
  • Created methods (get, post, put, delete) to make requests to the API server and tested Restful API using postman. Also used Loaded CloudWatch Logs to S3 and then load into Kinesis Streams for Data Processing.
  • Created Terraform scripts for EC2 instances, Elastic Load balancers and S3 buckets. Implemented Terraform to manage the AWS infrastructure and managed servers using configuration management tools like Chef and Ansible

Technologies: PL/SQL, Python, Azure-Data factory, Azure Blob storage, Azure table storage, Azure SQL server, Apache Hive, Apache Spark, MDM, Netezza, Teradata, Oracle 12c, SQL Server, Teradata SQL Assistant, Microsoft Word/Excel, Flask, AWS S3, AWS Redshift, Snowflake, AWS RDS, DynamoDB, Athena, Lambda, MongoDB, Pig, Sqoop, Tableau, Power BI and UNIX, Docker, Kubernetes.

Data Engineer

Confidential, Charlotte, NC.

Responsibilities:

  • Experienced in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data Ingestion and transformation in AWS and Spark.
  • Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization, performed Gap analysis provide feedback to the business team to improve the software delivery.
  • Data Mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization on provider, member, claims, and service fund data.
  • Involved in Developing a RESTful API's (Microservices) using Python Flask framework that is packaged in Docker and deployed in Kubernetes using Jenkins Pipelines.
  • Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in Pyspark.
  • Created reusable Rest API's that exposed data blended from variety of data sources by reliably gathering requirements from business directly.
  • Worked on the development of Data Warehouse, Business Intelligence architecture that involves data integration and the conversion of data from multiple sources and platform.
  • Responsible for full data loads from production to AWS Redshift staging environment and worked on migrating of EDW to AWS using EMR and various other technologies.
  • Experience in Creating, Scheduling and Debugging Spark jobs using Python. Performed Data Analysis, Data Migration, Transformation, Integration, Data Import, and Data Export through Python.
  • Gathered and processed raw data at scale (including writing scripts, web scraping, calling APIs, write SQL queries, writing applications).
  • Creating reusable Python scripts to ensure data integrity between source (Teradata/Oracle) and target system (Snowflake/Redshift).
  • Migrated on premise database structure to Confidential Redshift data warehouse.
  • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS and delivered high success metrics.
  • Implemented for authoring, scheduling, and monitoring Data Pipelines using scala and spark.
  • Experience in buildingSnow pipe, In-depth knowledge ofData Sharingin SnowflakeDatabase, Schema and Tablestructures.
  • Exploring DAG's, their dependencies and logs using Airflow pipelines for automation with a creative approach.
  • Designed and implemented a fully operational production grade largescale data solution on Snowflake.
  • Developed and designed system to collect data from multiple platforms using Kafka and then process it using spark.
  • Created modules for spark streaming in data into Data Lake using Spark and Worked with different feeds data like JSON, CSV, XML and implemented Data Lake concept.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements and developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive.

Technologies: Python, Teradata, Netezza, Oracle 12c, PySpark, MS Office (Word, Excel, and PowerPoint), SQL Server, UML, MS Visio, Oracle Designer, SQL Server 2012, Cassandra, Azure, Oracle SQL, Athena, SSRS, SSIS, AWS S3, AWS Redshift, AWS EMR, AWS RDS, DynamoDB, Lambda, Hive, HDFS, Sqoop, Scala, No- SQL (Cassandra) and Tableau

Data Engineer

Confidential

Responsibilities:

  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and Scala for data cleaning and preprocessing.
  • Experienced in installing, configuring and using Hadoop Ecosystem components.
  • Experienced in Importing and exporting data into HDFS and Hive usingSqoop.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Experienced in running query-usingImpalaand used BI tools to run ad-hoc queries directly on Hadoop.
  • IntegratedCassandraas a distributed persistent metadata store to provide metadata resolution for network entities on the network
  • Involved in various NOSQL databases likeHbase, Cassandrain implementing and integration.
  • Installed and configured Hive and writtenHive UDFsand Used Map Reduce and Junit for unit testing.
  • UsedDataStax Cassandraalong with Pentaho for reporting.
  • Queried and analyzed data fromDataStaxCassandrafor quick searching, sorting and grouping.
  • Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to hive and impala.
  • Designed and implemented a product search service usingApache Solr/Lucene.
  • Worked in installing cluster,commissioning & decommissioning of Data nodes, Name node recovery, capacity planning, and slots configuration.
  • UsedYarn Architecture and Map reduce 2.0in the development cluster for POC.
  • Supported Map Reduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS.Load and transform large sets of structured, semi structured and unstructured data.

Technologies: ETL, SSIS, SSRS, SAS, DDL, DML, SDLC, SQL Server, SQL Server Analysis Service, SSIS 2008, SSRS 2008, Teradata, SAS, SSIS, Informatica, XML, Power BI

Data Modeler

Confidential

Responsibilities:

  • Interacted with business users to analyze the business process and requirements and transformed requirements into Conceptual, logical and Physical Data Models, designing database, documenting and rolling out the deliverables.
  • Analyze various data sources such as SQL server, Oracle, Flat files etc. and identify potential source systems for projects.
  • Created data models for AWS Redshift and Athena from dimensional data models.
  • Coordinated data profiling/data mapping with business subject matter experts, data stewards, data architects, ETL developers, and data modelers.
  • Developed logical/physical data models using Erwin tool across the subject areas based on the specifications and established referential integrity of the system.
  • Reverse Engineering models from Databases, files, External metadata using Erwin.
  • Maintain and enhance data model with changes and furnish with definitions, notes, values and check lists.
  • Worked very close with Data Architects and DBA team to implement data model changes in database in all environments.
  • Good experience in building cross browser compatibility applications using HTML5 and CSS3, SSAS.
  • Experience in Designing and Building the Dimensions and cubes with star schema using SQL Server Analysis Services (SSAS).
  • Experience in modifying MDX Cubes and Dimensions in SSAS.
  • Expertise in Power B1, Power BI Pro, Power BI Mobile
  • Experience working in Azure analysis tabular model and Power Bl.
  • Expert in creating and developing Power BI Dashboards into a rich look.
  • Experience working in DAX for both AAS and Power Bl.
  • I'm knowledgeable in all aspects of designing and constructing efficient data, Extensive datawarehouse designand modeling experience Star and Snowflake
  • Highly experienced in usingDAX(Data Analysis Expressions) functions andDAXFunctions and M Code language functions on Power BI andPower Query.
  • Experience with Microsoft Power BI,DAX expressions, Power View andPower Queryintegrated with share point.
  • Review Data mapping from source to target.
  • Involving in query optimization and tuning stored procedures for better performance.

Environment: Oracle 11g, Erwin 9.0, SQL Server, SSIS/SSAS, Oracle 11g, Visual Studio 2012, Hadoop, Visio, AWS.

We'd love your feedback!