We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

2.00/5 (Submit Your Rating)

Patskala, OhiO

SUMMARY

  • Around 8+ years of IT experience in Analysis, Design, and Development of ETL in a highly dynamic and challenging environment.
  • Experienced in Agile Methodologies, Scrum stories, and sprints experience in a Python - based environment.
  • Hands-on experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/Sub cloud shell, GSUTIL, BQ command-line utilities, Data Proc, Stack driver.
  • Experienced in implementing Cloud solutions in AWS.
  • Experience in using various version control systems like Git, GitHub, and Amazon EC2 and deployment using Heroku.
  • Relevant Experience in working with various SDLC methodologies like Agile Scrum for developing and delivering applications.
  • Demonstrated experience in delivering data and analytic solutions leveraging AWS, Azure, or similar cloud Data Lake.
  • Experience in Designing and deploying AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, and Autoscaling groups.
  • Experience in developing web services (WSDL, SOAP, and REST) and consuming web services with python programming language.
  • Good exposure working with Hadoop distributions such as Cloudera, and Data Bricks. • Experience in working with Python ORM Libraries including Django ORM, and SQLAlchemy.
  • Experience in working with various Python Integrated Development, PyCharm, Eclipse, Sublime Text, and Notepad++.
  • Worked with Cloudera and Horton works distributions.
  • Experienced in performing code reviews and close involvement in smoke testing sessions, and retrospective sessions.
  • Experienced in Microsoft Business Intelligence tools, developing SSIS (Integration Service), SSAS (Analysis Service), and SSRS (Reporting Service), building Key Performance Indicators, and OLAP cubes
  • Has good exposure to the star, snowflake schema, and data modeling and works with different data warehouse projects.
  • Experienced in working with version control systems like GIT and used Source code management client tools like Git Bash, GitHub, and GitLab.
  • Experience in using Chef, Puppet, and Ansible configuration and automation tools. Configured and administered CI tools like Jenkins, and Hudson Bambino for automated builds.
  • Experience in GCP Dataproc, GCS, Cloud functions, Big Query.
  • Expertise in writing DDLs and DMLs scripts in SQL and HQL for analytics applications in RDBMS.
  • Proficient in writing SQL Queries, Stored procedures, functions, packages, tables, views, and triggers using relational databases like Oracle, MYSQL, and Non - Relational (MongoDB) databases.
  • Experience in working with NoSQL databases like HBase and Cassandra.
  • Proficient in using defect tracking/issue tracking/ Bug tracking tools like Atlassian Jira, and Bugzilla.
  • Excellent interpersonal and communication skills, efficient time management, and organization skills, ability to handle multiple tasks, and work well in a team environment.

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Patskala, Ohio

Responsibilities:

  • Participated in various stages of Software development life cycle (SDLC), Worked in an Agile (Framework: SCRUM) development environment.
  • Created pipelines for enabling the flow of data from different sources to the data warehouse using tools like Python, and SQL.
  • Extracted, Transformed, and Loaded the data using Amazon glue from S3 buckets for processing and storing the data to Redshift.
  • Expertise in using AWS Athena, an interactive query service, to analyze the data in S3 buckets using standard SQL.
  • Installed, configured, and managed the AWS server, and AWS data pipeline for Data Extraction. Created tables on top of data on AWS S3 obtained from different data sources and schedule it using Airflow.
  • Expertise in Big Data processing using Hadoop, Hadoop Ecosystem (Map Reduce, Spark, Scala, Hive, HBase, Mongo DB) implementation, maintenance, ETL, and Big Data analysis operations.
  • Developed operational analytics, financial analytics, model building and enrichment, and prediction engine for both batch and real-time using Java, Storm, Kafka, Akka, Spark MLib, Scikit-learn
  • Dealt with Python Open stack APIs, used Python scripts to update content in the database and manipulate files.
  • Developing the Python automated scripting using Boto3 library for AWS Security audit and reporting using AWS Lambda for multiple AWS Accounts.
  • Developed a script in Scala to read all the Parquet Tables in a Database and parse them as JSON files, and another script to parse them as structured tables in Hive.
  • Developed automated process for code builds and deployments using Jenkins, Ant, Maven, Sonar type, and Shell Script.
  • Used Beautiful Soup 4(python library) for Web Scraping to extract data for building graphs. • Used Kubernetes to orchestrate the deployment, scaling, and management of Docker Containers.
  • Migrated data into RV Data Pipeline using Data Bricks, Spark SQL, and Scala.
  • Worked with PyQuery for selecting DOM elements when parsing HTML.
  • Dealt with GitHub to pull requests, improved code quality, and conducted meetings among peers.
  • Created Local Virtual repositories for the project and release builds, repository management in Maven to share snapshots, and work with NOSQL DB - Mongo DB, Cassandra.
  • Involved in CI/CD(Jenkins) process for application deployments by enforcing strong source code repository management techniques and securing configuration files away from application source code for improved security
  • Used Version Control Tool GIT.
  • Worked on Jira for managing the tasks and improving individual performance.

Environment: Python, SQL, AWS Redshift, AWS Glue, AWS Athena, AWS S3, EC2, Oracle, MySQL, Mongo DB, AWS Lambda, Kafka, Beautiful Soup, Kubernetes, Docker, Jenkins, Maven, GIT, Jira, Agile, Visual Studio, Windows.

Data Engineer

Confidential, Bentonville, AR

Responsibilities:

  • Worked in an Agile environment by actively participating in sprint planning to Employ coding standards and advance guidelines for efficient and effective Python programming.
  • Build best practice ETLs with Apache Spark to load and transform raw data into easy-to-use dimensional data for self-service reporting.
  • Worked on google cloud platform (GCP) services like Compute Engine, Cloud Load Balancing, Cloud SQL, and Cloud Storage, monitoring cloud deployment services.
  • Built GCP data and analytics services such as BigQuery, BigQuery model training/serving, and managed spark and Hadoop services using DataProc in GCP.
  • Involved in creating the PySpark Data frame in Azure Databricks to read the data from Data Lake or Blob storage and use Spark SQL context for transformation.
  • Extracted Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics in Azure Databricks.
  • Developed Azure Data Factory, and Databricks pipelines to move the data from Azure blob storage/file share to Azure SQL Data warehouse and blob.
  • Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using Spark SQL.
  • Created MapReduce programs to enable data for transformation, extraction, and aggregation of multiple formats like Avro, Parquet, XML, JSON, CSV, and other compressed file formats.
  • Worked on CI/CD environment on deploying application on Docker containers.
  • Creating indexes on MySQL tables to improve the performance by eliminating the full table scans and views for hiding the actual tables and to eliminate the complexity of the large queries.
  • Used automation Jenkins for continuous integration and continuous delivery (CI/CD) on Amazon EC2.
  • Maintained the Version and Backup of the source using GitHub.
  • Updated storyboard organized Sprint dashboard and involved in stories grooming for future Sprint planning and preparation

Environment: Python, Pandas, XML, Azure Data Factory, Azure Data Lake, Hive, Azure Databricks, AWS, EC2, Boto3, Docker, Jenkins, PyCharm, MySQL, Jenkins, GITHUB, Agile, Windows.

Data Engineer

Confidential, Columbus, Indiana

Responsibilities:

  • Coded model level validation and provide guidance in making long term architectural design decisions and also used Agile Methodology and SCRUM process.
  • Utilized the existing Python and Django modules and rewritten to deliver data in required formats.
  • Built database Model, APIs and Views utilizing Python, to build an interactive web-based solution.
  • Worked on object-oriented programming (OOP) concepts using Python and Linux.
  • Embedded AJAX in UI to update small portions of the web page avoiding the need to reload the entire page.
  • Used Python in-built libraries urllib2 and beautiful soup modules for web scraping.
  • Designed Django REST web services using Python and Django to get and post data.
  • Used Python and Django creating graphics, XML processing of documents, data exchange and business logic implementation between servers.
  • Handled exceptions and used-test cases by writing python-scripts to refrain website from rendering Error codes.
  • Using GitLab for continuous integration and deployment and Git version control system for collaborating with teammates and maintaining code versions.
  • Logged user stories and acceptance criteria in JIRA for features by evaluating output requirements and formats.

Environment: Python, Django, HTML5, CSS3, Oracle, Git, Jira, Agile, Windows.

Hadoop Developer

Confidential

Responsibilities:

  • Requirement discussions, design the solution.
  • Estimated the Hadoop cluster requirements
  • Responsible for choosing the Hadoop components (hive, pig, map-reduce, Sqoop, flume etc)
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Hadoop cluster building and ingestion of data using Sqoop
  • Imported streaming logs to HDFS through Flume
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
  • Developed Use cases and Technical prototyping for implementing Hive,and Pig.
  • Worked in analyzing data using Hive, Pig and custom MapReduce programs in Java.
  • Implemented partitioning, dynamic partitions and buckets in HIVE
  • Installed and configured Hive, Sqoop, Flume, Oozie on the Hadoop cluster.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Tuned the Hadoop Clusters and Monitored for the memory management and for the Map Reduce jobs.
  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting.
  • Developed a custom Framework capable of solving small files problem in Hadoop.
  • Deployed and administered 70 node Hadoop clusters. Administered two smaller clusters.

Environment: Map Reduce, HBase, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (JDK 1.6), Eclipse.

Data Analyst

Confidential

Responsibilities:

  • Documented the system requirements to meet end-state requirements and complied Software Requirement Specification Document and Use Case document.
  • Prepared ETL (Extract, Transform and Load) standards, naming conventions and wrote ETL flow documentation.
  • Used Microsoft SharePoint to upload, manage all project related documents and have version control.
  • Segregated business requirements by analyzing them into low level and high level. Converted Business Requirements into Functional Requirements Document.
  • Prepared Dashboards using calculations, parameters in Tableau and generated KPI reports to be analysed by management.
  • Worked on SQL queries for data manipulation.
  • Arranged weekly team meetings to assign testing tasks and acquisition of status reports from individual team members.
  • Effectively managed change by deploying change management techniques such as Change Assessment, Impact Analysis and Root cause Analysis.
  • Used advanced Excel functions to generate spreadsheets and pivot tables.
  • Presented solutions in written reports while analyzing, designing, testing and monitoring systems in a waterfall methodology.

We'd love your feedback!