We provide IT Staff Augmentation Services!

Big Data Engineer Resume

0/5 (Submit Your Rating)

Indianapolis, IN

SUMMARY

  • Over 8+ years of experience in Data Engineering, Data Pipeline Design, Development and Implementation as a Sr. Data Engineer/Data Developer and Data Modeler.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing the data.
  • Extensively used Python Libraries PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Psycopg, embedPy, NumPy and Beautiful Soup.
  • Expertise in Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, VPC, ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy, Dynamo DB, Auto Scaling, Security Groups, Red shift, CloudWatch, CloudFormation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES.
  • Experience in implementing Azure data solutions, provisioning storage account, Azure Data Factory, SQL server, SQL Databases, SQL Data warehouse, Azure Data Bricks and Azure Cosmos DB.
  • Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.
  • Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle).
  • Experienced in big data analysis and developing data models using Hive, PIG, and Map reduce, SQL with strong data architecting skills designing data - centric solutions.
  • Experience working with data modeling tools like Erwin and ER/Studio.
  • Experience in designing star schema, Snowflake schema for Data Warehouse.
  • Hands-on use of Spark and Scala API's to compare the performance of Spark with Hive and SQL, and Spark SQL to manipulate Data Frames in Scala.
  • Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management.
  • Hands on Spark MLlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
  • Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
  • Knowledge of working with Proof of Concepts (PoC's) and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging and Teradata.
  • Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
  • Expertise in designing complex Mappings and have expertise in performance tuning and slowly changing Dimension Tables and Fact tables
  • Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from different source systems including flat files.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.
  • Good knowledge of Data Marts, OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
  • Excellent in performing data transfer activities between SAS and various databases and data file formats like XLS, CSV, etc.
  • Expertise in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python.
  • Experienced in development and support knowledge on Oracle, SQL, PL/SQL, T-SQL queries.
  • Experience in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
  • Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Inmon’s Approach.
  • Experience in working with Excel Pivot and VBA macros for various business scenarios.
  • Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS).

TECHNICAL SKILLS

Database: SQL Server 2017, 2016, 2014, 2012, 2008/R2

Big Data Technologies: Kafka, Cassandra, Apache Spark, Spark Streaming, HDFS, MapReduce, Hive, Pig, Sqoop

Hadoop Distribution: Cloudera CDH, Apache, AWS, Horton Works HDP, SDLC Agile, Scrum, Waterfall, and Spiral

Programming Languages: Python, R, PySpark, Pig, Hive QL

Cloud Infrastructure: AWS, MS Azure

Data Modelling: ER-win, MS Visio

Database Programming: T-SQL, Dynamic SQL, MDX, DAX

Development Tools: BIDS, SSDT, SSMS

Integration Tool: SSIS

Analysis Services: SSAS, OLAP Cubes, Tabular Model

Reporting Tools: SSRS, Power BI, Excel Power BI

Source Control & Collaboration Tool: Team Foundation Server (TFS)

Snowflake: Snow Pipe, SnowSQL Time Travel, Stages (External/Internal) ODBC, Resource Monitor

PROFESSIONAL EXPERIENCE

Confidential, Indianapolis, IN

Big Data Engineer

Responsibilities:

  • Responsible for architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design.
  • Comparing the results of traditional system to Hadoop environment to identify any differences and fix them by finding the route cause.
  • Create a complete processing engine, based on Hortonworks distribution, enhanced to performance.
  • Design, Develop and test ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Deployed the HBase cluster in cloud (AWS) environment with scalable nodes as per the incremental business requirement.
  • Implemented AWS IAM for managing the user permissions of applications that runs on EC2 instances.
  • Deployed applications onto AWS lambda with HTTP triggers and integrated them with API Gateway
  • Developed multiple ETL Hive scripts for data cleansing and transformations for data.
  • Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
  • Used Apache Nifi for loading PDF Documents from Microsoft SharePoint to HDFS. Worked on the Publish component to read the source data, extract metadata and apply transformations to build Solr Documents, index them using SolrJ.
  • Exported data from Hive to AWS s3 bucket for further near real time analytics.
  • Ingested data in real time from Apache Kafka to Hive and HDFS.
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
  • Use of Sqoop to import and export data from RDBMS to HDFS and vice-versa.
  • Exporting data to Teradata using SQOOP.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's
  • Used AWS services like EC2 and S3 for small data sets processing and storage
  • Analysed the sql scripts and designed it by using PySpark SQL for faster performance.
  • Implemented Kerberos Security Authentication protocol for existing cluster.
  • Good experience in troubleshooting production level issues in the cluster and its functionality.
  • Complete end-to-end design and development of Apache Nifi flow, which acts as the agent between middleware team and EBI team and executes all the actions mentioned above.
  • Utilized Ansible and Chef as configuration management tools to deploy consistent infrastructure across multiple environment
  • Involved in advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Developed Ansible Plays to configure, deploy and maintain software components of the existing infrastructure
  • Regular Commissioning and Decommissioning of nodes depending upon the amount of data.

Environment: AWS, AWS S3, Hive, Spark, Java, SQL Server, PySpark, Ansible, Hortonworks, Apache Solr, Apache Tika, Linux, Redshift, Maven, GIT, JIRA, ETL, Toad 9.6, UNIX Shell Scripting, Scala, Apache Nifi.

Confidential, St Louis, Missouri

Data Engineer

Responsibilities:

  • Worked on Supply chain data such as location & tracking of the packages, types of packages, service centers, PO vendors, Transaction’s data, NMFC data, Item class, Product Types, etc.,
  • Created CRM Order processing and social data landing (Azure Blob Storage, Azure Data Lake Storge) on Snowflake on Azure with integrated dashboards (Power BI, Snowflake Web UI) to leverage
  • Data extraction from various sources, Transformation and Loading into the target SQL Server Database. Implemented Copy activity, Custom Azure Data Factory Pipeline Activities for On-cloud ETL processing.
  • Worked on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
  • Automize the Power BI reports, dashboards and Azure Data Factory (ADF) pipelines when source data updated.
  • Created Pipelines and Load the data in Azure SQL Datawarehouse through Data Lake and ADF activities. Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • Involved in developing the end-to-end ELT pipelines for migrating the on-premises Oracle database to Azure SQL Datawarehouse.
  • Implemented Spark using python and Spark SQL for faster testing and processing of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Used Hadoop scripts for HDFS (Hadoop Distributed File System) data loading and manipulation.
  • Involve into Application Design and Data Architecture using Cloud and Big Data solutions on Microsoft Azure.
  • Worked on building the data pipeline using Azure Service like Data Factory to load the data from Legacy SQL server to Azure Data Base using Data Factories, API Gateway Services, SSIS Packages, Talend Jobs, custom .Net and Python codes.
  • Worked on implementation of some check points like hive count check, Sqoop records check, done file create check, done file check and touch file lookup.
  • Worked on both Agile and Kanban methodologies
  • Primarily responsible for creating new Azure Subscriptions, data factories, Virtual Machines, SQL Azure Instances, SQL Azure DW instances, HD Insight clusters and installing DMGs on VMs to connect to on premise servers.

Environment: Hadoop, Hive, Oozie, Spark, Spark SQL, Python, PySpark, Azure Data Factory, Azure SQL, Azure Databricks, Azure DW, BLOB storage, Scala, AWS, Linux, Maven, Oracle 11g/10g, Zookeeper, MySQL, Snowflake.

Confidential, Scottsdale, AZ

Data Developer

Responsibilities:

  • Migrate data from on-premises to AWS storage buckets
  • Developed a python script to transfer data from on-premises to AWS S3
  • Developed a python script to hit REST API’s and extract data to AWS S3
  • Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue and StepFunctions
  • Created yaml files for each data source and including glue table stack creation
  • Worked on a python script to extract data from Netezza databases and transfer it to AWS S3
  • Developed Lambda functions and assigned IAM roles to run python scripts along with various triggers (SQS, EventBridge, SNS)
  • Created a Lambda Deployment function, and configured it to receive events from S3 buckets
  • Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using commands with Crontab.
  • Developed various Mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer
  • Developed Mappings using Transformations like Expression, Filter, Joiner and Lookups for better data messaging and to migrate clean and consistent data
  • Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables and connected to Tableau for generating interactive reports using Hive server2.
  • Used Sqoop to channel data from different sources of HDFS and RDBMS.
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats.
  • Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.
  • Used Apache NiFi to copy data from local file system to HDP.
  • Worked on Dimensional and Relational Data Modeling using Star and Snowflake Schemas, OLTP/OLAP system, Conceptual, Logical and Physical data modeling using Erwin.
  • Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File System.

Tools: & Environment: BigData3.0, Hadoop3.0, Oracle12c, PL/SQL, Scala, Spark-SQL, PySpark, Python, kafka1.1, SAS, Azure SQL, MDM, Oozie4.3, SSIS, T-SQL, ETL, HDFS, Cosmos, Pig0.17, Sqoop1.4, MS Access.

Confidential

ETL/DWH Developer

Responsibilities:

  • Ingested data from various data sources into Hadoop HDFS/Hive Tables using SQOOP, Flume, Kafka.
  • Extended Hive core functionality by writing custom UDFs using Java.
  • Worked on multiple POCs in Implementing Data Lake for Multiple Data Sources ranging from Team Center, SAP, Workday, Machine logs.
  • Experience in Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure HDInsight Big Data Technologies (Hadoop and Apache Spark) and Data bricks.
  • Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure Event Hub and Service Bus Queue.
  • Created Linked service to land the data from Caesars SFTP location to Azure Datalake.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Worked on MS Sql Server PDW migration for MSBI warehouse.
  • Planning, scheduling and implementing Oracle to MS SQL server migrations for AMAT in house applications and tools.
  • Integrated Tableau with Hadoop data source for building dashboard to provide various insights on sales of the organization.
  • Worked on Spark in building BI reports using Tableau. Tableau was integrated with Spark using Spark-SQL.
  • Developed Spark jobs using Scala and Python on top of Yarn/MRv2 for interactive and Batch Analysis.
  • Developed work flows in Live Compare to Analyze SAP Data and Reporting.
  • Worked on Java development and support and tools support for in house applications.
  • Developed multitude of dashboards with Power BI and depicted differing KPIs for business analysis as per business requirements.

Tools: & Environment: Hadoop, Map Reduce, Hive, Azure, SQL, PL/SQL, T/SQL, XML, Informatica, Python, Tableau, OLAP, SSIS, SSRS, Excel, OLTP, Git.

Confidential

Responsibilities:

  • People Tech is a rising head in the Enterprise Applications and IT Services commercial center. Individuals Tech draws its aptitude from key organizations with innovation pioneers like Microsoft, Oracle, and SAP and consolidates that with the profound comprehension of its representatives. With a solid spotlight on structures for data accessibility and examination
  • Roles & Responsibilities:
  • Implemented the application using Agile methodology. Involved in daily scrum and sprint planning meetings.
  • Actively involved in analysis, detail design, development, bug fixing and enhancement.
  • Driving the technical design of the application by collecting requirements from the Functional Unit in the design phase of SDLC.
  • Developed Micro services using RESTful services to provide all the CRUD capabilities.
  • Creating requirement documents and design the requirement using UML diagrams, Class diagrams, Use Case diagrams for new enhancements.
  • Used JBoss application server deployment of applications.
  • Developed communication among SOA services.
  • Involved in creation of both service and client code for JAX-WS and used SOAPUI to generate proxy code from the WSDL to consume the remote service.
  • Designed the user interface of the application using HTML5, CSS3, JavaScript, Angular JS and AJAX.
  • Worked with Session Factory, ORM mapping, Transactions and HQL in Hibernate framework.
  • Used Web services for sending and getting data from different applications using Restful.
  • Wrote client side and server-side validations using Java Scripts Validations.
  • Writing stored procedures, complex SQL queries for backend operations with the database.
  • Devised logging mechanism using Log4j.
  • GitHub has been used as a Version Controlling System.
  • Creating tracking sheet for tasks and timely report generation for tasks progress.

Environment: SQL Server 2012/2014, T-SQL, SQL Profiler, DTA, ETL, SSIS, SSRS, SSMS, SSDT.

We'd love your feedback!