Data Engineer Resume TX - Hire IT People

SUMMARY

Skilled in AWS services to design and deploy an application based on given requirements.
Worked on implementing the Extraction, Transforming and Loading (ETL) strategy by creating packages and scripts from Artifactory, extract data from sources (example Pickle) format;
Analysis data frame filter data based on requirements, make use of Lambda functions for row and column - based transformations use Pandas as one of primary library.
Bringing expertise of development with Big Data Hadoop cluster (HDFS, MapReduce frameworks), Hive, Pig, Python.
Experienced with programming in Data Warehouses using Star Schema, Snowflake depending on business needs.
Expertise with cloud infrastructure like Amazon AWS, S3, EC2. Experience working with libraries pandas, NumPy.
Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and advance data processing.
Work with most CM tools (JIRA, Confluence, Git) and their usage / process ensuring traceability, repeatability, quality, and support.
Worked on Hive, Python, Scala Struts web framework, experience with Streaming processing e.g., Kafka Spark Streaming.
Experience developing solutions on top of the AWS technologies EMR, EC2, S3, Redshift, Dynamo, Kinesis.
Strengths in Testing tools implementation with CURL, Postman.
Experienced with SOA Architecture and Micro-service Architecture with over 4 years of experience.
Designed the real-time analytics and ingestion platform using Storm and Kafka. Wrote Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
Analytical Database Programming in Oracle and MS SQL (Microsoft Structure Query Language) thinking.

TECHNICAL SKILLS

Development methodology: Agile, Waterfall, Scrum

Programming Languages: Python, C, C++

Cloud Technologies: EC2 Instance, Elastic load balancer, IAM, S3 buckets, RDS, Elastic Bean Stalk, Dynamo DB, API gateway, Lambdas, Kafka

Management Tools: Jira, Rally, Bit-bucket

Cloud Technologies: AWS - EC2 Instances, Elastic load balancer, IAM roles, S3, Dynamo DB, API gateway, Lambdas, Kafka

CI/CD Tools: Jenkins, Docker, Kubernetes, Gitlab

IDE’s: PyCharm, Visual Studio code

Bug Tracking tools: Jira, Firebug, Bugzilla

Logging Frameworks/ Version Control: Git, GitHub, GitLab, SPLUNK

Testing\Logging: Splunk, Postman

Message Brokers: RabbitMQ, Kafka, Zookeeper

Framework: Spark, Kafka, Elasticsearch, PostgreSQL/Redshift.

PROFESSIONAL EXPERIENCE

Confidential, TX

Data Engineer

Responsibilities:

Scripts for the automation of data pipelines deployment; Bash and Python scripts for data collection, processing, and storage.
Worked on Python components for extract transform load (ETL) process, along with the implementation of producer - consumer models to enable parallel processing.
Management of key data engineering components which include the following:
Data pipelines deployed in production, QA, and development environments focusing on traceability, repeatability, quality, and support by updating configuration management tools and their usage/process.
Automated pipelines to export data from Snowflake and S3 sources to be processed using in-house modeling and index in Elasticsearch to work with API teams on integration.
Amazon Web Services (AWS) EC2 instances while the resources are shutting down as necessary to generate cost savings; and
Keep Metadata UpToDate for production data including migration of datasets to the latest in-house developed platform.
Developed solutions on AWS technologies (EMR, EC2, and S3); as well as enterprise data process for production environment.
Modification of existing pipelines and metadata in line with the new requirements of emerging technologies and current industry standards through proactive coordination with various teams across the enterprise.
Use of Pool for multithreading data frame both for Lambda and pre-defined functions for parallel processing.
Utilization of Elasticsearch with Kafka topics to enable near real-time data collection and indexing.
Creating unit test/regression test framework for working / predeveloped code for Scrappy developed inhouse.
Utilized Py test, the Pythonunit test framework, for all Python applications.
Helped in interactive API documentation for specific Python SDK methods to write custom requirements.
Worked on testcase for OCR for documents scraping and use Firefox plug-in for search in document rendering.
Added support for Amazon AWS S3 and RDS to host files and the database into Amazon Cloud.

Confidential, CHEVYCHASE, MD

Data Engineer

Responsibilities:

Involved in most aspects of SDLC, such as requirements analysis, design, implementation, testing, quality control and deployment.
Created a Spark API converting from pig scripts / components of the ETL process.
Responsible for loading unstructured and semi-structured data into Hadoop by creating static and dynamic partitions.
Written Python and Shell scripts for DI and other error handling and mailing systems (DUSTc).
Used Pandas to put the data as time series and tabular form for east timestamp data manipulation and retrieval.
Created functional test environment for scenarios/scripts, performed testing, and supported user acceptance testing of developed solution.
Parallel copying of files between various clusters using Kafka in Hadoop.
Active involvement in long running PL/SQL processes to run faster and accurate.
Used MongoDB for data storage and wrote non-SQL queries for retrieving and updating the data.
Worked on batch processing, and stateful transformations in Spark Streaming, using Lamda architecture.
Worked in design and developing enhancements of CSG using AWS APIs.
Involved in code migration of quality monitoring tool from AWS EC2 to AWS Lamda to reduce the costs incurred due to reserved EC2 instances.
Developed microservices using EMR Lambda, API Gateway, DynamoDB, RDS according to the scenario.
Use CURL, Postman as testing tools for connections to DataPower emulating as client, develop test scripts; Release wise from Worksoft certify perspective.
Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, Cloud Watch, and other services of AWS family.
Actively involved in development of bigdata application in Scala as Scala has the capability of combining functional and object-oriented programming.
Use Spark to process the data before ingesting the data into the HBase. Both Batch and real-time spark jobs were created using Scala.
Consumed APIs created by other teams that maintain different data of the customers and created new Orchestration Service APIs for filtering and altering the data as per the requirements.
Participated in developing Big Data Hadoop application using Talendon cloud AWS (Amazon Web services).
Realtime experience on Streaming technologies such as Apache Kafka.
Involved in integration using Message channels like Kafka to communicate to MDM
Optimized issues in Jenkins, while automate all kind of Testing in Jenkins.
Developed and enhanced Hibernate techniques with features such as lazy loading, batch fetching, and eager fetching to accomplish boost in application performance.
Deployed applications into continuous integration environments like Kubernetes to integrate and deploy code on CI environments for development testing and finally into production.
Utilized knowledge of message queuing, stream processing and highly scalable ‘bog data’ data stores.

Confidential

Data Engineer

Responsibilities:

Transition functional requirements in Agile environment and analyze complete software development lifecycle, including performance analysis, design, development, and testing.
Migrate existing data from Mainframe / Teradata / SQL Server to Hadoop and perform ETL operations on it.
Reduced the latency of Python scripts by introducing Lambda for column-based transformations follows other performance and optimization techniques.
UsedPythonmodules such as requests, urllib, urllib2 for web crawling.
Contribute from data side engineers to develop REST APIs using Swagger, microservices style of architecture with Kafka as message broker and Mongo DB as backend database.
Used PL/SQL programming in Analysis, Design and Implementation of Business Applications using the Oracle Relational Database Management System (RDBMS).
Established Microservices architecture using docker and Kubernetes.
Involved in defining integration using Message channels like Kafka and Spring Cloud Streams.
Developed Python packages that uses a large load ETL process used the data from an existing Oracle database into a new PostgreSQL cluster.
Develop code using Pair Programming and Test-Driven Development TDD process.
Utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build, test deploy.
Worked on developing ETL pipeline on S3 parquet files on data lake using AWS Glue.
Responsible for implement Continuous Delivery pipeline with Docker, Jenkins, GitHub and AWS AMI’s.
Involved in configured and integrated GIT into the continuous integration (CI) environment along with Jenkins and written scripts to containerize using Ansible with Docker and orchestrate it using Kubernetes.
Handling server related issues ensure code quality and handle new requirement, changes, and patch movements.
Worked on testing both batch and online jobs performance testing through SQL and provide application recommendations.

Confidential

Python Developer

Responsibilities:

Enhanced existing web application by using Python, Django, AWS, MySQL, and MongoDB.
Compare the data in existing data SQL Server and Teradata from various Datametrics when data transformation or data loading takes place.
Worked on data quality process set-up on AWS; for entire Financial Auto Loans division.
Deployment of the web application using the Linux server with Bash scripts.
Developed Spark scripts by using Python and Bash commands as per the requirement
Conduct user requirement analysis to design and program applications and deliver support for system enhancements.
Implemented code NoSQL Databases like MongoDB for writing non-SQL Queries.
Involved in using Spring integration messaging channel for logging events.
Implemented Python IOC (Inversion of Control), Django Framework and handled the security usingPythonSpring Security.
Utilized knowledge of Jenkins for continuous integration code quality inspection and worked on building local repository mirror and source code management using Git hub.
Utilized Pytest, the Pythonunit test framework, for all Python applications.
Optimized Thread Safe blocks to improve multithreaded access and enter valid transactions with speed and performance.
Wrote MySQL pipelines for search Entity, Retrieve Entity, Create Entity, and Update Entity web services.
Used Jasper Reports to a generate detailed monthly analysis report.
Investigated issues and defects to determine problem root cause and formulate corrective action recommendations.

Confidential

Python Developer

Responsibilities:

Worked with Visual Studio IDE for Python development refactoring.
Worked with quality assurance team to set up scripts, test case and automation protocols.
Actively participated in built process databases and table structures following Spring Framework for web applications.
Worked in the development of applications, especially in the UNIX environment and familiar with commands.
Resolved version control problem with SVN for improved tracking of Source code changes and provided exceptional revert system.
Followed coding practices seen in standard Struts Actions and services using persistence frameworks.
Experience on Cloud infrastructure AWS provides.
Worked on development of SQL and stored procedures on MYSQL.
Used SQL Loader data from the Legacy systems to Oracle DBs using control files;
Used Oracle External Tables feature to read the data from flat files into Oracle staging tables.
Reviewed SQL queries on inner, left, & right joins in Tableau web and desktop by connecting live/dynamic and static datasets.
Provide highly durable and available data using S3 data store, versioning, lifecycle policies, and create AMIs for mission critical production servers for backup.
Performed Quality Assurance Postproduction deployment support.
Created Low level design documents, Unit test plan and production support document.
Performed external quality assurance and internal quality assurance of peer code review and brainstorming identify possible gaps.
Multiple team support including onsite to ensure seamless coordination.
Prepare project requirements and design documentation for the CRs as per the changes in business requirements.
User requirements understanding, consolidation and analysis of the same for the most feasible and optimum solutions.

We provide IT Staff Augmentation Services!

Data Engineer Resume

TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship