Data Engineer Resume
Bentonville, AR
SUMMARY
- Over 10+ years of experience in Application analysis, Design, Development, Testing, Maintenance and Supporting web, Client - server based applications which includes 4+ years of experience with Big Data, Hadoop, and Data Analytics related components like HDFS, Map Reduce, Pig, Hive, YARN, Sqoop, HBase, Microsoft Azure, GCP, Data Bricks, Python and Spark.
- Experience in multiple Hadoop distributions like Cloudera and Horton works.
- Excellent understanding of NoSQL databases.
- Developed Ingestion and processing systems for streaming/real-time data using bigdata technologies.
- Responsible for clearly communicating the technical aspects, risks, challenges in the project.
- Good understanding of Cloud services likeCompute, Network, Storage,andIdentity & access management.
- Experience on working structured, unstructured data with various file formats such as Avro data files, xml files, JSON files.
- Experience working with both Batch and Real-time data processing.
- Experience in dealing with Windows Azure IaaS - Virtual Networks, Virtual Machines, Cloud Services, Resource Groups, Express Route, Traffic Manager, VPN, Load Balancing, Application Gateways, and Auto-Scaling.
- Used GitHub version control tool to push and pull functions to get the updated code from repository.
- Excellent written and verbal communication skills.
- Resourceful, able to find solutions when faced with challenges.
TECHNICAL SKILLS
Big Data Technologies: HDFS, GCP, MR, Hive, Azure, AWS, EC2, Pig, YARN, Sqoop, Hue, Oozie, Spark.
Programming Languages: HiveQL, Big Query, Python, Scala, UNIX, XML, C, C++, SQL, HTML, T-SQL.
Databases /NoSQL: SQL, Pl/SQL, HBase, Cassandra, Oracle, DB2, Teradata.
Tools: Microsoft Power BI,Tableau (BI), ThoughtSpot (BI), SAS VA, SAP BO(BI), Talend(ETL).
Platforms: Windows, Unix, Linux.
PROFESSIONAL EXPERIENCE
Data Engineer
Confidential
Responsibilities:
- Collaborating with key stakeholders to understand business background and define the scope of the objectives of projects.
- Working with Solution Architects and development teams for data modelling in Big Data cloud environment.
- Determining the technical approaches to be used and defining the appropriate methodologies and preparation of architectural design document.
- Develop architecture for the technical solutions to be built based on the business requirements and scope.
- Working on Azure cloud technologies including Azure Synapse Analytics, App Insights, SQL, Cosmos DB Change Feed and Cosmos DB Database (NoSQL database on SQL API & Cassandra API).
- Implemented reusable SCD1 & SCD2 frameworks in Azure Data Lake using data bricks Delta Merge.
- Created Batch & structured streaming ADF pipelines using Databricks Spark, Delta tables, ADLS Gen2, Azure Key-vault, Azure Blob & Azure Event Hub.
- Implemented Complex billing database in synapse for data analytics and customer billing.
- Generated Power BI Reports as per business requirements.
Technologies: ServiceNow, Microsoft Azure Services, Hive, Databricks, Spark, Python, Salesforce, Git.
Senior Big Data/Cloud Engineer
Confidential, Bentonville, AR
Responsibilities:
- Design and Develop Big Data Landing Zone for Finance, Supply Chain International and Health & Wellness Projects.
- Develop distributed applications and algorithms for parallel processing of petabyte sized data on Hortonworks Hadoop distributions.
- Designed and Implemented enterprise level data migration solution from legacy system to scalable distributed storage and computing systems.
- Designed pipelines for Batch, Near Realtime and real-time/streaming data ingestion to Hadoop systems using Spark, PySpark and Kafka
- Develop Python, PySpark, Spark scripts to filter/cleanse/map/aggregate data.
- Created Incremental data pipelines for GLS data feeds.
- Processed large volumes and variety of data (Structured and unstructured data, writing code for parallel processing, XMLS and JSONs).
- Develop Object-Oriented and Functional Programming software using Map/Reduce, Spark, and ETL applications.
- Develop data ingestion framework using Python, PySpark and Spark.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, and transformations.
- Perform advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Copied Hive data to Azure Platform and BigQuery table for AI and Machine Learning.
- Expose Analytics as a Service RESTful APIs for application integration and data consumption.
- Create Global Data Analytics Platform (GDAP).
Technologies: HDP 3.0, HDFS, Azure, GCP 234.0.0, Map Reduce, Hive, Pig, YARN, Ambari, Hue, Sqoop, Flume, Oozie, Teradata, Spark, Python, Automic.
Hadoop Developer
Confidential, New York City, NY
Responsibilities:
- Analyze and design a pipeline to collect, clean, and prepare data for analysis using MapReduce, Pig, SAS, HiveQL, Hive and HBase.
- Develop scripts and batch jobs to schedule various Hadoop programs in Data Lake.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Effort estimation, integration efforts and Schedule Monitoring
- Apply spark and spark streaming, create RDD's, apply operations - Transformation and Actions.
- Manage and Review log files to identify issues when job fails.
- Prepare demo dashboards in SAS visual analytics for business.
Technologies: HDP 2.3, SAS, HDFS, Map Reduce, Hive, Pig, YARN, Ambari, Sqoop, Flume, Oozie, Spark, Python.
Senior Software Engineer
Confidential, Chicago, IL
Responsibilities:
- Plan, Develop and Implement large-scale projects from conception to completion.
- Translate client’s business requirements and objectives into technical applications and solutions.
- Implement new integration activities.
Technologies: Analytics, SAS, R, BI, BOBJ, RDBMS, Crystal Reports, Ektron, PHP, Java, SQL, QA.
Software Engineer
Confidential
Responsibilities:
- Develop and architect lifecycle of projects working on different technologies and platforms.
- Provide innovative solutions to complex business problems.
- Interface with clients and gather business requirements and objectives.
- Understand and evaluate complex data models.
- Execute system development and maintenance activities.
- Develop solutions to improvise performance and scalability of applications.
Technologies: Analytics, SAS, BI, BOBJ, RDBMS, Crystal Reports, Ektron, PHP, Java, SQL, QA