We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

TX

SUMMARY

  • Skilled in AWS services to design and deploy an application based on given requirements.
  • Worked on implementing the Extraction, Transforming and Loading (ETL) strategy by creating packages and scripts from Artifactory, extract data from sources (example Pickle) format;
  • Analysis data frame filter data based on requirements, make use of Lambda functions for row and column - based transformations use Pandas as one of primary library.
  • Bringing expertise of development with Big Data Hadoop cluster (HDFS, MapReduce frameworks), Hive, Pig, Python.
  • Experienced with programming in Data Warehouses using Star Schema, Snowflake depending on business needs.
  • Expertise with cloud infrastructure like Amazon AWS, S3, EC2. Experience working with libraries pandas, NumPy.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and advance data processing.
  • Work with most CM tools (JIRA, Confluence, Git) and their usage / process ensuring traceability, repeatability, quality, and support.
  • Worked on Hive, Python, Scala Struts web framework, experience with Streaming processing e.g., Kafka Spark Streaming.
  • Experience developing solutions on top of the AWS technologies EMR, EC2, S3, Redshift, Dynamo, Kinesis.
  • Strengths in Testing tools implementation with CURL, Postman.
  • Experienced with SOA Architecture and Micro-service Architecture with over 4 years of experience.
  • Designed the real-time analytics and ingestion platform using Storm and Kafka. Wrote Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Analytical Database Programming in Oracle and MS SQL (Microsoft Structure Query Language) thinking.

TECHNICAL SKILLS

Development methodology: Agile, Waterfall, Scrum

Programming Languages: Python, C, C++

Cloud Technologies: EC2 Instance, Elastic load balancer, IAM, S3 buckets, RDS, Elastic Bean Stalk, Dynamo DB, API gateway, Lambdas, Kafka

Management Tools: Jira, Rally, Bit-bucket

Cloud Technologies: AWS - EC2 Instances, Elastic load balancer, IAM roles, S3, Dynamo DB, API gateway, Lambdas, Kafka

CI/CD Tools: Jenkins, Docker, Kubernetes, Gitlab

IDE’s: PyCharm, Visual Studio code

Bug Tracking tools: Jira, Firebug, Bugzilla

Logging Frameworks/ Version Control: Git, GitHub, GitLab, SPLUNK

Testing\Logging: Splunk, Postman

Message Brokers: RabbitMQ, Kafka, Zookeeper

Framework: Spark, Kafka, Elasticsearch, PostgreSQL/Redshift.

PROFESSIONAL EXPERIENCE

Confidential, TX

Data Engineer

Responsibilities:

  • Scripts for the automation of data pipelines deployment; Bash and Python scripts for data collection, processing, and storage.
  • Worked on Python components for extract transform load (ETL) process, along with the implementation of producer - consumer models to enable parallel processing.
  • Management of key data engineering components which include the following:
  • Data pipelines deployed in production, QA, and development environments focusing on traceability, repeatability, quality, and support by updating configuration management tools and their usage/process.
  • Automated pipelines to export data from Snowflake and S3 sources to be processed using in-house modeling and index in Elasticsearch to work with API teams on integration.
  • Amazon Web Services (AWS) EC2 instances while the resources are shutting down as necessary to generate cost savings; and
  • Keep Metadata UpToDate for production data including migration of datasets to the latest in-house developed platform.
  • Developed solutions on AWS technologies (EMR, EC2, and S3); as well as enterprise data process for production environment.
  • Modification of existing pipelines and metadata in line with the new requirements of emerging technologies and current industry standards through proactive coordination with various teams across the enterprise.
  • Use of Pool for multithreading data frame both for Lambda and pre-defined functions for parallel processing.
  • Utilization of Elasticsearch with Kafka topics to enable near real-time data collection and indexing.
  • Creating unit test/regression test framework for working / predeveloped code for Scrappy developed inhouse.
  • Utilized Py test, the Pythonunit test framework, for all Python applications.
  • Helped in interactive API documentation for specific Python SDK methods to write custom requirements.
  • Worked on testcase for OCR for documents scraping and use Firefox plug-in for search in document rendering.
  • Added support for Amazon AWS S3 and RDS to host files and the database into Amazon Cloud.

Confidential, CHEVYCHASE, MD

Data Engineer

Responsibilities:

  • Involved in most aspects of SDLC, such as requirements analysis, design, implementation, testing, quality control and deployment.
  • Created a Spark API converting from pig scripts / components of the ETL process.
  • Responsible for loading unstructured and semi-structured data into Hadoop by creating static and dynamic partitions.
  • Written Python and Shell scripts for DI and other error handling and mailing systems (DUSTc).
  • Used Pandas to put the data as time series and tabular form for east timestamp data manipulation and retrieval.
  • Created functional test environment for scenarios/scripts, performed testing, and supported user acceptance testing of developed solution.
  • Parallel copying of files between various clusters using Kafka in Hadoop.
  • Active involvement in long running PL/SQL processes to run faster and accurate.
  • Used MongoDB for data storage and wrote non-SQL queries for retrieving and updating the data.
  • Worked on batch processing, and stateful transformations in Spark Streaming, using Lamda architecture.
  • Worked in design and developing enhancements of CSG using AWS APIs.
  • Involved in code migration of quality monitoring tool from AWS EC2 to AWS Lamda to reduce the costs incurred due to reserved EC2 instances.
  • Developed microservices using EMR Lambda, API Gateway, DynamoDB, RDS according to the scenario.
  • Use CURL, Postman as testing tools for connections to DataPower emulating as client, develop test scripts; Release wise from Worksoft certify perspective.
  • Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, Cloud Watch, and other services of AWS family.
  • Actively involved in development of bigdata application in Scala as Scala has the capability of combining functional and object-oriented programming.
  • Use Spark to process the data before ingesting the data into the HBase. Both Batch and real-time spark jobs were created using Scala.
  • Consumed APIs created by other teams that maintain different data of the customers and created new Orchestration Service APIs for filtering and altering the data as per the requirements.
  • Participated in developing Big Data Hadoop application using Talendon cloud AWS (Amazon Web services).
  • Realtime experience on Streaming technologies such as Apache Kafka.
  • Involved in integration using Message channels like Kafka to communicate to MDM
  • Optimized issues in Jenkins, while automate all kind of Testing in Jenkins.
  • Developed and enhanced Hibernate techniques with features such as lazy loading, batch fetching, and eager fetching to accomplish boost in application performance.
  • Deployed applications into continuous integration environments like Kubernetes to integrate and deploy code on CI environments for development testing and finally into production.
  • Utilized knowledge of message queuing, stream processing and highly scalable ‘bog data’ data stores.

Confidential

Data Engineer

Responsibilities:

  • Transition functional requirements in Agile environment and analyze complete software development lifecycle, including performance analysis, design, development, and testing.
  • Migrate existing data from Mainframe / Teradata / SQL Server to Hadoop and perform ETL operations on it.
  • Reduced the latency of Python scripts by introducing Lambda for column-based transformations follows other performance and optimization techniques.
  • UsedPythonmodules such as requests, urllib, urllib2 for web crawling.
  • Contribute from data side engineers to develop REST APIs using Swagger, microservices style of architecture with Kafka as message broker and Mongo DB as backend database.
  • Used PL/SQL programming in Analysis, Design and Implementation of Business Applications using the Oracle Relational Database Management System (RDBMS).
  • Established Microservices architecture using docker and Kubernetes.
  • Involved in defining integration using Message channels like Kafka and Spring Cloud Streams.
  • Developed Python packages that uses a large load ETL process used the data from an existing Oracle database into a new PostgreSQL cluster.
  • Develop code using Pair Programming and Test-Driven Development TDD process.
  • Utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build, test deploy.
  • Worked on developing ETL pipeline on S3 parquet files on data lake using AWS Glue.
  • Responsible for implement Continuous Delivery pipeline with Docker, Jenkins, GitHub and AWS AMI’s.
  • Involved in configured and integrated GIT into the continuous integration (CI) environment along with Jenkins and written scripts to containerize using Ansible with Docker and orchestrate it using Kubernetes.
  • Handling server related issues ensure code quality and handle new requirement, changes, and patch movements.
  • Worked on testing both batch and online jobs performance testing through SQL and provide application recommendations.

Confidential

Python Developer

Responsibilities:

  • Enhanced existing web application by using Python, Django, AWS, MySQL, and MongoDB.
  • Compare the data in existing data SQL Server and Teradata from various Datametrics when data transformation or data loading takes place.
  • Worked on data quality process set-up on AWS; for entire Financial Auto Loans division.
  • Deployment of the web application using the Linux server with Bash scripts.
  • Developed Spark scripts by using Python and Bash commands as per the requirement
  • Conduct user requirement analysis to design and program applications and deliver support for system enhancements.
  • Implemented code NoSQL Databases like MongoDB for writing non-SQL Queries.
  • Involved in using Spring integration messaging channel for logging events.
  • Implemented Python IOC (Inversion of Control), Django Framework and handled the security usingPythonSpring Security.
  • Utilized knowledge of Jenkins for continuous integration code quality inspection and worked on building local repository mirror and source code management using Git hub.
  • Utilized Pytest, the Pythonunit test framework, for all Python applications.
  • Optimized Thread Safe blocks to improve multithreaded access and enter valid transactions with speed and performance.
  • Wrote MySQL pipelines for search Entity, Retrieve Entity, Create Entity, and Update Entity web services.
  • Used Jasper Reports to a generate detailed monthly analysis report.
  • Investigated issues and defects to determine problem root cause and formulate corrective action recommendations.

Confidential

Python Developer

Responsibilities:

  • Worked with Visual Studio IDE for Python development refactoring.
  • Worked with quality assurance team to set up scripts, test case and automation protocols.
  • Actively participated in built process databases and table structures following Spring Framework for web applications.
  • Worked in the development of applications, especially in the UNIX environment and familiar with commands.
  • Resolved version control problem with SVN for improved tracking of Source code changes and provided exceptional revert system.
  • Followed coding practices seen in standard Struts Actions and services using persistence frameworks.
  • Experience on Cloud infrastructure AWS provides.
  • Worked on development of SQL and stored procedures on MYSQL.
  • Used SQL Loader data from the Legacy systems to Oracle DBs using control files;
  • Used Oracle External Tables feature to read the data from flat files into Oracle staging tables.
  • Reviewed SQL queries on inner, left, & right joins in Tableau web and desktop by connecting live/dynamic and static datasets.
  • Provide highly durable and available data using S3 data store, versioning, lifecycle policies, and create AMIs for mission critical production servers for backup.
  • Performed Quality Assurance Postproduction deployment support.
  • Created Low level design documents, Unit test plan and production support document.
  • Performed external quality assurance and internal quality assurance of peer code review and brainstorming identify possible gaps.
  • Multiple team support including onsite to ensure seamless coordination.
  • Prepare project requirements and design documentation for the CRs as per the changes in business requirements.
  • User requirements understanding, consolidation and analysis of the same for the most feasible and optimum solutions.

We'd love your feedback!