Big Data Engineer Resume
2.00/5 (Submit Your Rating)
Albany, NY
SUMMARY
- 8+years of IT industry experience, working in a progressive and dynamic environment, wif emphasis on Data Integrity and Data Quality, Business Intelligence concepts, Database Management system, development and complete Project Life Cycle in Data Warehousing and Client/Server technologies.
- Thorough noledge of teh SDLC wif hands on experience in teh requirements analysis and design and customer acceptance of teh Business Intelligence solutions.
- Experience wif solid understanding of Business Requirement Gathering, Business Process Flow, Business Process Modeling and Business Analysis.
- Extensive experience in AWS (Amazon Web Services) Cloud and GCP ( Confidential Cloud Platform).
- Worked extensively wif Apache Hadoop ecosystem like HDFS, Spark, Flume, Kafka, Airflow and Hive.
- Built batch and streaming pipelines using AWS services like S3, RDS, RedShift, Lambda, Glue, EMR, Ec2, Atana, Step Functions, CloudWatch, SageMaker, IAM roles, QuickSight and many other services.
- Similarly used GCP services like Cloud storage, BigQuery, Cloud Composer, PubSub, Cloud Monitoring, Cloud Functions, Cloud Compute Engine Data Proc and Power BI.
- Used Storage services like HDFS, AWS S3 and GCP Cloud Storage.
- Extensive experience in Data Warehouse services like Amazon Redshift, Confidential BigQuery and Snowflake.
- Used orchestration tools like Apache Airflow and Step Function.
- Used container services like Kubernetes and Docker.
- Has noledge in both teh technical and functional aspects of teh projects. Well versed in handling SQL Database and writing SQL queries to test teh data.
- Experience in Data Warehouse development working wif Data Migration, Data Conversion and ETL using Microsoft SQL server Integration services and SQL Server.
- Extensive experience in developing complex data extract, applications, and ad - hoc queries as requested by internal and external customers using SQL.
- Experience wif dashboard / report design wif Tableau, Power BI and QuickSight.
- Hands on experience on Unix shell scripting.
- Maintain teh production Business Intelligence products and solutions, including resolving production issues and returning quickly to priority problems.
- Troubleshoot and resolve data issues impacting extract delivery.
- Created different interactive views and reporting Dashboards by combining multiple views using Tableau Desktop.
- Designing and developing prototype for reports and gaining approval from client for further development. Created Parameterized Crystal Reports wif conditional formatting and used sub reports.
- Excellent interdepartmental communication: teamwork and coordination wif senior level executives to meet and exceed project goals; efficient operations under time-sensitive and fluctuating timelines.
TECHNICAL SKILLS
Scripting and Programming language: SQL, Python, PySpark, Shell and Linux
Cloud Providers: AWS and GCP
Container Services: Docker and Kubernetes
Orchestration tools: Airflow, Cloud Composer, Step Function
Data warehouse Services: Redshift, BigQuery, Snowflake
PROFESSIONAL EXPERIENCE
Confidential - Albany NY
Big Data Engineer
Responsibilities:
- Collaborate wif technical, application and security leads to deliver a reliable and secure Big Data infrastructure tools using cutting edge technologies like Spark, Container services and AWS services.
- Developed data processing pipelines in Spark and other big data technologies.
- Designed and deployed high performance systems wif reliable monitoring, logging practices and dashboards.
- Worked wif information Security teams to create data policies and develop interfaces and retention models and deployed teh solution to production.
- Designed, Architected, and Developed solutions leveraging big data technology (Open Source, AWS) to ingest, process and analyze large, disparate data sets to exceed business requirements
- Used AWS services like AWS S3, AWS RDS, AWS Redshift, AWS Atana, AWS Lambda, AWS Ec2, AWS EMR, AWS IAM, AWS Step Functions, AWS CloudWatch, AWS Glue, AWS QuickSight, AWS EKS, etc.,
- Used Orchestration service like Apache Airflow, created several dags as part of batch and streaming pipelines.
- Used AWS Lambda, AWS Kinesis, AWS S3, AWS Redshift, for streaming pipelines.
- Used AWS Lambda, AWS Glue, AWS Atana, AWS S3, AWS RedShift and AWS EMR for Batch and streaming pipelines.
- Created a POC and MVP using DBT as a transformation layer to load data into Data Warehouse.
- Used terraform to deploy AWS services, developed terraform code for modules and resources.
- Implemented Continuous Integration Continuous Delivery CICD.
- Developed POC and MVP using GCP ( Confidential Cloud Services).
Confidential - Chicago, IL
Big Data Engineer
Responsibilities:
- Collaborated wif business, analytical teams, and data scientist to improve efficiency, increase teh applicability of predictive models, and help translate ad-hoc analyses into scalable data delivery solutions.
- Collaborated wif DevOps team to integrate innovations and algorithms into a production system.
- Worked wif teh DevOps team to create and manage deployment workflows for all scripts and code.
- Develop and maintained scalable data pipelines dat will ingest, transform, and distribute data streams and batches wifin teh AWS S3 and Snowflake using AWS Step Function, AWS Lambda, AWS Kinesis, AWS Glue and AWS EMR.
- Created batch pipelines using AWS S3, AWS Lambda, AWS Glue, AWS EMR, AWS Atana, AWS RedShift, AWS RDS etc.,
- Orchestrated pipelines and dataflow using Apache Airflow and Step Function.
- Created reports and dashboards using AWS services like Lambda, Glue, Step Function and QuickSight.
- Created monitoring service using AWS CloudWatch, AWS Lambda, AWS Glue, AWS Step Function, Grafana and ElasticSearch.
- Created Airflow dags to extract, transform and load data into Data Warehouse.
- Developed and deployed Kubernetes pods to extract, transform and load data.
- Used Docker and Kubernetes for Data Pipelines and ETL Pipelines.
- Used Hadoop ecosystems like Apache HDFS, Apache Spark, Apache Hive, Apache Airflow and Apache Kafka
- Facilitated teh developed and deployed proof-of-concept machine learning systems.
Confidential
Data Analyst
Responsibilities:
- Performed a comparative analysis between control and treatment groups to understand teh impact of an Confidential Eats product which is becoming an integral part of teh Eats ecosystem.
- Created intuitive and impactful dashboards dat depict product flow in a comprehensible manner to drive key business decisions.
- Conducted data deep dives on various aspects of teh product wif teh goal to improve teh number of completed orders, teh process TEMPhas made a huge impact wif a visible improvement of 7% wifin teh span of 5 months.
- Created a query bank wif 120+ queries to serve teh data needs of teh Account Managers in teh US&C Sales team theirby cutting down teh need for raising bespoken data requests to get access to teh data required for swift decision making.
- Used AWS services like AWS S3, AWS RDS and AWS Redshift for storage.
- Created Apache Airflow dags to ingest data from sources like API’s, Servers, and Databases to transform using PySpark in Glue and EMR and loaded data into Data Warehouse like AWS RedShift.
- Created Data pipelines using AWS services like S3, Glue, EMR, Lambda, Atana, IAM, etc.,
- Created reports and dashboards dat provide information on metrics, usage, trends, and behaviors using AWS services like S3, Lambda, Atana and QuickSight.
Confidential
Data Analyst
Responsibilities:
- Worked wif support teams to define requirements, document workflows, processes, identify gaps, make improvement recommendations, drive consensus, and facilitate change.
- Responsible for process management and working wif project managers to identify improvement opportunities to increase operational efficiency.
- Worked parallelly wif product managers and technical solution engineers to prioritize requirements analysis in a fast-paces, rapidly changing environment.
- Gatheird business requirements, definition, and design of teh data sourcing and worked wif teh data warehouse architect on teh development of logical data models.
- Created sophisticated visualizations, calculated columns, and custom expressions, and developed Map charts, Cross tables, Bar charts, Tree maps, and complex reports which involve Property Controls, Custom Expressions.
- Research on Reinforcement Learning and control (TensorFlow, Torch), and machine learning model (Scikit-learn).
- Performed K-means clustering, Regression, and Decision Trees in R. Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.
- Used Data Quality validation techniques to validate Critical Data Elements (CDE) and identified many anomalies. Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel and Python.
- Performed reverse engineering using Erwin to redefine entities, attributes, and relationships in an existing database.
- Performed Regression testing for Gloden Test Cases from State (end to end test cases) and automated teh process using python scripts.
- Developed Spark jobs using PySpark to run in AWS Glue and EMR for faster real-time analytics and used Spark SQL for querying.
- Generated graphs and reports using teh ggplot package in RStudio for analytical models.
- Developed and implemented R and Shiny application which showcases machine learning for business forecasting.
- Developed predictive models using Decision Trees, Random Forest, and Naïve Bayes.
- Created several types of data visualizations using Python and Tableau. Extracted Mega Data from AWS using SQL Queries to create reports.
- Used pandas, NumPy, seaborne, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms. Expertise in R, MATLAB, python, and respective libraries.
- Implemented various statistical techniques to manipulate teh data like missing data imputation, TEMPprincipal component analysis, and sampling.
Confidential
ETL Developer
Responsibilities:
- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Zookeeper, Kafka and Sqoop.
- Integrated HDP clusters wif Active Directory and enabled Kerberos for Autantication.
- Installed and Configured Sqoop to import and export teh data into Hive from Relational databases.
- Administering large Hadoop environments build and support cluster set up, performance tuning and monitoring in an enterprise environment.
- Monitoring teh Hadoop cluster functioning through MCS and worked on NoSQL databases including HBase.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs and worked wif Linux server admin team in administering teh server hardware and operating system.
- Configured Spark Streaming to receive real time data from teh Kafka and store teh stream data to HDFS.
- Designed and Developed data mapping procedures ETL-Data Extraction, Data Analysis and Loading process for integrating data using R programming.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Developed story telling dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand teh data on teh fly wif teh usage of quick filters for on demand needed information.
- Close monitoring and analysis of teh MapReduce job executions on cluster at task level and optimized Hadoop clusters components to achieve high performance.