Data Engineer Resume
TX
SUMMARY
- Skilled in AWS services to design and deploy an application based on given requirements.
- Worked on implementing the Extraction, Transforming and Loading (ETL) strategy by creating packages and scripts from Artifactory, extract data from sources (example Pickle) format;
- Analysis data frame filter data based on requirements, make use of Lambda functions for row and column - based transformations use Pandas as one of primary library.
- Bringing expertise of development with Big Data Hadoop cluster (HDFS, MapReduce frameworks), Hive, Pig, Python.
- Experienced with programming in Data Warehouses using Star Schema, Snowflake depending on business needs.
- Expertise with cloud infrastructure like Amazon AWS, S3, EC2. Experience working with libraries pandas, NumPy.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and advance data processing.
- Work with most CM tools (JIRA, Confluence, Git) and their usage / process ensuring traceability, repeatability, quality, and support.
- Worked on Hive, Python, Scala Struts web framework, experience with Streaming processing e.g., Kafka Spark Streaming.
- Experience developing solutions on top of the AWS technologies EMR, EC2, S3, Redshift, Dynamo, Kinesis.
- Strengths in Testing tools implementation with CURL, Postman.
- Experienced with SOA Architecture and Micro-service Architecture with over 4 years of experience.
- Designed the real-time analytics and ingestion platform using Storm and Kafka. Wrote Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Analytical Database Programming in Oracle and MS SQL (Microsoft Structure Query Language) thinking.
TECHNICAL SKILLS
Development methodology: Agile, Waterfall, Scrum
Programming Languages: Python, C, C++
Cloud Technologies: EC2 Instance, Elastic load balancer, IAM, S3 buckets, RDS, Elastic Bean Stalk, Dynamo DB, API gateway, Lambdas, Kafka
Management Tools: Jira, Rally, Bit-bucket
Cloud Technologies: AWS - EC2 Instances, Elastic load balancer, IAM roles, S3, Dynamo DB, API gateway, Lambdas, Kafka
CI/CD Tools: Jenkins, Docker, Kubernetes, Gitlab
IDE’s: PyCharm, Visual Studio code
Bug Tracking tools: Jira, Firebug, Bugzilla
Logging Frameworks/ Version Control: Git, GitHub, GitLab, SPLUNK
Testing\Logging: Splunk, Postman
Message Brokers: RabbitMQ, Kafka, Zookeeper
Framework: Spark, Kafka, Elasticsearch, PostgreSQL/Redshift.
PROFESSIONAL EXPERIENCE
Confidential, TX
Data Engineer
Responsibilities:
- Scripts for the automation of data pipelines deployment; Bash and Python scripts for data collection, processing, and storage.
- Worked on Python components for extract transform load (ETL) process, along with the implementation of producer - consumer models to enable parallel processing.
- Management of key data engineering components which include the following:
- Data pipelines deployed in production, QA, and development environments focusing on traceability, repeatability, quality, and support by updating configuration management tools and their usage/process.
- Automated pipelines to export data from Snowflake and S3 sources to be processed using in-house modeling and index in Elasticsearch to work with API teams on integration.
- Amazon Web Services (AWS) EC2 instances while the resources are shutting down as necessary to generate cost savings; and
- Keep Metadata UpToDate for production data including migration of datasets to the latest in-house developed platform.
- Developed solutions on AWS technologies (EMR, EC2, and S3); as well as enterprise data process for production environment.
- Modification of existing pipelines and metadata in line with the new requirements of emerging technologies and current industry standards through proactive coordination with various teams across the enterprise.
- Use of Pool for multithreading data frame both for Lambda and pre-defined functions for parallel processing.
- Utilization of Elasticsearch with Kafka topics to enable near real-time data collection and indexing.
- Creating unit test/regression test framework for working / predeveloped code for Scrappy developed inhouse.
- Utilized Py test, the Pythonunit test framework, for all Python applications.
- Helped in interactive API documentation for specific Python SDK methods to write custom requirements.
- Worked on testcase for OCR for documents scraping and use Firefox plug-in for search in document rendering.
- Added support for Amazon AWS S3 and RDS to host files and the database into Amazon Cloud.
Confidential, CHEVYCHASE, MD
Data Engineer
Responsibilities:
- Involved in most aspects of SDLC, such as requirements analysis, design, implementation, testing, quality control and deployment.
- Created a Spark API converting from pig scripts / components of the ETL process.
- Responsible for loading unstructured and semi-structured data into Hadoop by creating static and dynamic partitions.
- Written Python and Shell scripts for DI and other error handling and mailing systems (DUSTc).
- Used Pandas to put the data as time series and tabular form for east timestamp data manipulation and retrieval.
- Created functional test environment for scenarios/scripts, performed testing, and supported user acceptance testing of developed solution.
- Parallel copying of files between various clusters using Kafka in Hadoop.
- Active involvement in long running PL/SQL processes to run faster and accurate.
- Used MongoDB for data storage and wrote non-SQL queries for retrieving and updating the data.
- Worked on batch processing, and stateful transformations in Spark Streaming, using Lamda architecture.
- Worked in design and developing enhancements of CSG using AWS APIs.
- Involved in code migration of quality monitoring tool from AWS EC2 to AWS Lamda to reduce the costs incurred due to reserved EC2 instances.
- Developed microservices using EMR Lambda, API Gateway, DynamoDB, RDS according to the scenario.
- Use CURL, Postman as testing tools for connections to DataPower emulating as client, develop test scripts; Release wise from Worksoft certify perspective.
- Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, Cloud Watch, and other services of AWS family.
- Actively involved in development of bigdata application in Scala as Scala has the capability of combining functional and object-oriented programming.
- Use Spark to process the data before ingesting the data into the HBase. Both Batch and real-time spark jobs were created using Scala.
- Consumed APIs created by other teams that maintain different data of the customers and created new Orchestration Service APIs for filtering and altering the data as per the requirements.
- Participated in developing Big Data Hadoop application using Talendon cloud AWS (Amazon Web services).
- Realtime experience on Streaming technologies such as Apache Kafka.
- Involved in integration using Message channels like Kafka to communicate to MDM
- Optimized issues in Jenkins, while automate all kind of Testing in Jenkins.
- Developed and enhanced Hibernate techniques with features such as lazy loading, batch fetching, and eager fetching to accomplish boost in application performance.
- Deployed applications into continuous integration environments like Kubernetes to integrate and deploy code on CI environments for development testing and finally into production.
- Utilized knowledge of message queuing, stream processing and highly scalable ‘bog data’ data stores.
Confidential
Data Engineer
Responsibilities:
- Transition functional requirements in Agile environment and analyze complete software development lifecycle, including performance analysis, design, development, and testing.
- Migrate existing data from Mainframe / Teradata / SQL Server to Hadoop and perform ETL operations on it.
- Reduced the latency of Python scripts by introducing Lambda for column-based transformations follows other performance and optimization techniques.
- UsedPythonmodules such as requests, urllib, urllib2 for web crawling.
- Contribute from data side engineers to develop REST APIs using Swagger, microservices style of architecture with Kafka as message broker and Mongo DB as backend database.
- Used PL/SQL programming in Analysis, Design and Implementation of Business Applications using the Oracle Relational Database Management System (RDBMS).
- Established Microservices architecture using docker and Kubernetes.
- Involved in defining integration using Message channels like Kafka and Spring Cloud Streams.
- Developed Python packages that uses a large load ETL process used the data from an existing Oracle database into a new PostgreSQL cluster.
- Develop code using Pair Programming and Test-Driven Development TDD process.
- Utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build, test deploy.
- Worked on developing ETL pipeline on S3 parquet files on data lake using AWS Glue.
- Responsible for implement Continuous Delivery pipeline with Docker, Jenkins, GitHub and AWS AMI’s.
- Involved in configured and integrated GIT into the continuous integration (CI) environment along with Jenkins and written scripts to containerize using Ansible with Docker and orchestrate it using Kubernetes.
- Handling server related issues ensure code quality and handle new requirement, changes, and patch movements.
- Worked on testing both batch and online jobs performance testing through SQL and provide application recommendations.
Confidential
Python Developer
Responsibilities:
- Enhanced existing web application by using Python, Django, AWS, MySQL, and MongoDB.
- Compare the data in existing data SQL Server and Teradata from various Datametrics when data transformation or data loading takes place.
- Worked on data quality process set-up on AWS; for entire Financial Auto Loans division.
- Deployment of the web application using the Linux server with Bash scripts.
- Developed Spark scripts by using Python and Bash commands as per the requirement
- Conduct user requirement analysis to design and program applications and deliver support for system enhancements.
- Implemented code NoSQL Databases like MongoDB for writing non-SQL Queries.
- Involved in using Spring integration messaging channel for logging events.
- Implemented Python IOC (Inversion of Control), Django Framework and handled the security usingPythonSpring Security.
- Utilized knowledge of Jenkins for continuous integration code quality inspection and worked on building local repository mirror and source code management using Git hub.
- Utilized Pytest, the Pythonunit test framework, for all Python applications.
- Optimized Thread Safe blocks to improve multithreaded access and enter valid transactions with speed and performance.
- Wrote MySQL pipelines for search Entity, Retrieve Entity, Create Entity, and Update Entity web services.
- Used Jasper Reports to a generate detailed monthly analysis report.
- Investigated issues and defects to determine problem root cause and formulate corrective action recommendations.
Confidential
Python Developer
Responsibilities:
- Worked with Visual Studio IDE for Python development refactoring.
- Worked with quality assurance team to set up scripts, test case and automation protocols.
- Actively participated in built process databases and table structures following Spring Framework for web applications.
- Worked in the development of applications, especially in the UNIX environment and familiar with commands.
- Resolved version control problem with SVN for improved tracking of Source code changes and provided exceptional revert system.
- Followed coding practices seen in standard Struts Actions and services using persistence frameworks.
- Experience on Cloud infrastructure AWS provides.
- Worked on development of SQL and stored procedures on MYSQL.
- Used SQL Loader data from the Legacy systems to Oracle DBs using control files;
- Used Oracle External Tables feature to read the data from flat files into Oracle staging tables.
- Reviewed SQL queries on inner, left, & right joins in Tableau web and desktop by connecting live/dynamic and static datasets.
- Provide highly durable and available data using S3 data store, versioning, lifecycle policies, and create AMIs for mission critical production servers for backup.
- Performed Quality Assurance Postproduction deployment support.
- Created Low level design documents, Unit test plan and production support document.
- Performed external quality assurance and internal quality assurance of peer code review and brainstorming identify possible gaps.
- Multiple team support including onsite to ensure seamless coordination.
- Prepare project requirements and design documentation for the CRs as per the changes in business requirements.
- User requirements understanding, consolidation and analysis of the same for the most feasible and optimum solutions.