We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

TX

SUMMARY:

  • Around 7 years of experience as Python developer with proven expertise in using new tools and technical developments to drive improvements throughout entire Software Development Lifecycle.
  • Experience in working with various Python Integrated Development, PyCharm, Atom, Spyder, Eclipse, Visual Studio Code, Notepad++, and Sublime Text.
  • Worked on chatbot development for providing relevant product information to the customers using Jarvis AI frameworks.
  • Developed and designed an API (RESTful Web Service) for the chatbot integration.
  • Built CI/CD pipelines using Docker, Jenkins, and Marathon
  • Proficient in developing Web Services, RESTful in Python using XML, JSON.
  • Developed applications using RESTful architecture using Node.js and Python as backend languages and Used Numpy for Numerical analysis.
  • Experienced in using big data platforms like DataBricks, Pyspark, Glue, EMR.
  • Worked on Big data tools like Hive QL, Pyspark, Jupyter Notebooks, AWS Athena, AWS Glue.
  • Enhanced Configuration Management using (PUPPET) to assist with automated, repeatable and consist configuration and application deployments.
  • Designing, implementing, and maintaining solutions for using Docker, AWS, Redshift, or Kubernetes Jenkins, Git, and Puppet for microservices and continuous deployment.
  • Expertise with cloud platforms like Amazon AWS, S3, EC2
  • Hands on in design, develop, test and implementation of web development Python 3, Django 1.7/1.8, HTML, XML, CSS, JavaScript, Bootstrap, jQuery, JSON and, AngularJS and Node.js.
  • Hands on experience working with quant/data Python libraries (pandas/NumPy)
  • Performed various Parsing technique's using PySpark API'S to cleanse the data from Kafka.
  • Worked on developing Restful endpoints to cache application specific data in in - memory data clusters like REDIS and exposed them with Restful endpoints.
  • Working on Spinnaker platform for Multi-Cloud Continuous Delivery (Bake, Test, & Deploy/Container Pipelines) using Packer, Terraform, Kubernetes, AWS, GCP.
  • Used SQL Alchemy as Object Relational Mapper (ORM) for writing ORM queries.
  • Developed custom consumers and producers for Apache Kafka in Go (Golang) for cars monitoring system.
  • Designed the real-time analytics and ingestion platform using Storm and Kafka. Wrote Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Experience in developing web applications by following Model View Control (MVC) Architecture using server-side applications Django, Flask, WebPy, BottlePy, NumPy and Pyramid.
  • Build back-end application with Python / Django, Worked on Dockers, and Jenkins.
  • Extensive experience in using Microsoft BI Studio products for implementation of ETL methodology in data extraction, transformation, and loading.
  • Proficient in SQL databases MS SQL, MySQL, Oracle, MongoDB, Amazon DynamoDB, or MongoDB
  • Good at writing SQL Queries, Stored procedures, functions, packages, tables, views, triggers.
  • Experienced in various types of testing such as Unit testing, Integration testing, User acceptance testing, Functional testing.
  • Developed a fully automated continuous integration system using Git, DB2 LUW and custom tools developed in Python and Bash.
  • Experience in working with Machine Learning Frameworks like segmentation networks, Transfer learning and deep learning.
  • Experience on Amazon Web Services for deploying using Code commit and Code deploy of EC2 instances consisting of various flavors like AWS Linux AMI, Red Hat Linux Enterprise, SUSE Linux, Ubuntu server, Microsoft Window Server2012.

TECHNICAL SKILLS:

Python 2.0X: 3.0X, java, R, C

Database: MySQL, Oracle

Automation Testing: Selenium

Scripting Languages: CSS, AJAX, Shell, PHP, JavaScript, JQuery.

Servers: Apache Tomcat, open stock, IBM web sphere

IDEs/Tools: PyCharm, Jupyter, PySpark, Eclipse, Spyder

Project Management Tools: Jira, GitHub, Slack

MS office: MS Excel, Word, Power point, etc.

Cloud services: AWS S3, EC2, Athena, Amazon EMR

WORK EXPERIENCE:

Confidential

Data Engineer

Responsibilities:

  • Built database model, Views and API's using Python for interactive web-based solutions.
  • Used UML Rational Rose to develop Use-case, Class and Object diagrams for OOA/OOD techniques.
  • Placed data into JSON files using Python to test Django websites. Used Python scripts to update the content in database and manipulate files.
  • Experience working with Amazon web services like AppSync, Amplify, DynamoDB, Elastic Search, Lambda, API Gateway, Step Functions etc.
  • Hand-on experience in Microsoft Azure cloud platform and merge with python
  • Knowledge in Web Services, API’s using RESTful and GraphQL APIs in python.
  • Developed web-based application using Django framework and Golang.
  • Generated Python Django forms to maintain the record of online users.
  • Used Django API's to access the database.
  • Involved in Python OOD code for quality, logging, monitoring, and debugging code optimization.
  • Skilled in using Collections in Python for manipulating and looping through different user defined objects.
  • Wrote Python modules to view and connect the Apache Cassandra instance.
  • Created Unit test/ regression test framework for working/new code.
  • Installed and maintained web servers Tomcat and Apache HTTP in UNIX.
  • Utilized standard Python modules such as CSV, intercools and pickle for development.
  • Developed efficient Angularjs and Reactjs for client web-based application.
  • Responsible for designing, developing, testing, deploying and maintaining the web application.
  • Designed and developed the UI for the website with HTML, XHTML, CSS, Java Script and AJAX.
  • Involved in working with Python open stock API's.
  • Developed Spark Code using python for faster processing of data.
  • Responsible for managing large databases using MySQL.
  • Wrote and executed various MySQL database queries from Python-MySQL connector and MySQL db package.
  • Used the Python's modules NumPy, matplotlib, etc. for generating complex graphical data, creation of histograms etc.
  • Developed and designed automation framework using Python and Shell scripting.
  • Involved in debugging and troubleshooting issues and fixed many bugs in two of the main applications which are main source of data for customers and internal customer service team.
  • Implemented SOAP/RESTful web services in JSON format.
  • Associated with debugging the applications monitored on JIRA using agile methodology.
  • Attended many day-to-day meetings with developers and users and performed QA testing on the application.
  • Experience in using containers like Docker.

Environment: Python 2.7,3.3, Django 1.4, HTML, CSS, AJAX, Tomcat, Apache HTTP, Angular.js, JSON, Restful, XML, JavaScript, OOD, Shell Scripting, GIT Hub, MYSQL, Cassandra, JIRA.

Confidential, TX

Data Engineer

Responsibilities:

  • Worked on designing and developing the Real - Time Tax Computation Engine using Oracle, StreamSets, Kafka, Spark Structured Streaming and MySQL
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Involved in ingestion, transformation, manipulation, and computation of data using StreamSets, Kafka, MySQL, Spark
  • Involved in data ingestion into MySQL using Kafka - MySQL pipeline for full load and Incremental load on variety of sources like web server, RDBMS, and Data API’s.
  • Worked on Spark Data sources, Spark Data frames, Spark SQL and Streaming using Scala.
  • Worked extensively on AWS Components such as Elastic Map Reduce (EMR), Elastic Compute Cloud (EC2), Simple Storage Service (S3)
  • Experience in developing Spark application using Scala SBT
  • Experience in integrating Spark-MySQL connector and JDBC connector to save the data processed in Spark to MySQL.
  • Responsible for creating tables and MySQL pipelines which are automated to load the data in to tables from Kafka topics
  • Performed a POC to check the time taking for Change Data Capture (CDC) of oracle data across Stream, StreamSets and DB Visit
  • Expertise in using different file formats like Text files, CSV, Parquet, JSON
  • Experience in custom compute functions using Spark SQL and performed interactive querying.
  • Responsible for masking and encrypting the sensitive data on the fly
  • Responsible for creating multiple applications for reading the data from different Oracle instances to Kafka topics using Stream
  • Responsible for setting up a MySQL cluster on AWS EC2 Instance
  • Experience in Real time streaming the data using Spark with Kafka.
  • Responsible for creating a Kafka cluster using multiple brokers.
  • Experience working on Vagrant boxes to setup a local Kafka and StreamSets pipelines.

Environment: Oracle, Stream Sets, Kafka, Spark Structured Streaming, MySQL, AWS, JDBC, Text files, CSV, Spark-MySQL.

Confidential - NJ

Python Developer

Responsibilities:

  • Understanding business needs, analyzing functional specifications and map those to develop and designing MapReduce programs and algorithms
  • Designed and implemented MapReduce-based large-scale parallel relation-learning system.
  • Customized Flume interceptors to encrypt and mask customer sensitive data as per requirement
  • Recommendations using Item Based Collaborative Filtering in Apache Spark.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Built web portal using JavaScript, it makes a REST API call to the Elastic search and gets the row key.
  • Used Kibana, which is an open source-based browser analytics and search dashboard for Elastic Search.
  • Used Amazon web services (AWS) like EC2 and S3 for small data sets.
  • Performed importing data from various sources to the Cassandra cluster using Java APIs or Sqoop.
  • Developed iterative algorithms using Spark Streaming in Scala for near real-time dashboards.
  • Installed and configured Hadoop and Hadoop stack on a 40-node cluster.
  • Involved in customizing the partitioner in MapReduce in order to root Key value pairs from Mapper to Reducers in XML format according to requirement.
  • Configured Flume for efficiently collecting, aggregating, and moving large amounts of log data.
  • Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data.
  • Implemented AWS services to provide a variety of computing and networking services to meet the needs of applications
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Designed and built the Reporting Application, which uses the Spark SQL to fetch and generate reports on HBase table data.
  • Worked on batch processing of data sources using Apache Spark, Elastic search
  • Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase.
  • Used different file formats like Text files, Sequence Files, Avro, Record Columnar CRC, ORC
  • Strong Experience in implementing Data warehouse solutions in Amazon web services (AWS) Redshift; Worked on various projects to migrate data from on premise databases to AWS Redshift, RDS and S3.
  • Involved in ETL, Data Integration and Migration
  • Responsible for creating Hive UDF’s that helped spot market trends.
  • Optimizing Hadoop MapReduce code, Hive/Pig scripts for better scalability, reliability, and performance
  • Experience in storing the analyzed results back into the Cassandra cluster.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Developed and tested code for unit testing and functional testing.

Environment: Apache Spark, HBase, Kibana, AWS, Cassandra, Flume, Oozie, ETL tools, Hive/Pig.

Confidential

Python Developer

Responsibilities:

  • Involved in analysis, specification, design, and implementation and testing phases of Software.
  • Development Life Cycle (SDLC) and used agile methodology for developing application.
  • Working as an application developer experienced with controllers, views and models in Django
  • Used Salt Stack to configure and manage the infrastructure
  • Restful web services using Python REST API Framework.
  • Implemented the application using Python Spring IOC (Inversion of Control), Django Framework and handled the security using Python Spring Security.
  • Tested entire frontend and backend modules using Python on Django Web Framework
  • Responsible for handling the integration of database system.
  • Developed Server-side automation using Node JS scripting and connecting different types of SQL and NoSQL stores from Node JS.
  • Used object-relational mapping (ORM) solution, technique of mapping data representation from MVC model to Oracle Relational data model with an SQL-based scheme.
  • Implemented Performance tuning and improved the Performance of Stored Procedures and Queries.
  • Installed and configured py Builder for application builds and deploying it.
  • Used Selenium Library to write fully functioning test automation process that allowed the simulation of submitting different we request from multiple browser to web application.
  • Developed and Deployed SOAP based Web Services on Tomcat Server.
  • Used Jenkins for continuous integration for code quality inspection and worked on building local repository mirror and source code management using Git hub.
  • Used IDE tool to develop the application and JIRA for bug and issue tracking.
  • Wrote unit testing codes using unit test, resolving bugs and other defects using Firebug.
  • Used JIRA to assign, track, report and audit the issues.
  • Used GIT to coordinate team development.

Environment: Python, Django Web Framework, HTML, CSS, NoSQL, JavaScript, JQuery, Sublime Text, Jira, GIT, py Builder, unit test, Firebug, Web Services.

Confidential

Python Data Engineer

Responsibilities:

  • Developed Data pipelines using python for medical image pre-processing, Training and Testing.
  • Developed Artificial Intelligence Platform which helps Data Scientists to Train, Test and develop A.I. models on Amazon Sagemaker.
  • Used Pandas, OpenCV, NumPy, Seaborn, TensorFlow, Keras, Matplotlib, Sci-kit-learn, NLTK in Python for developing data pipelines and various machine learning algorithms.
  • Design and engineer REST APIs and/or packages that abstract feature extraction and complex prediction/forecasting algorithms on time series data.
  • Developed Python application for Google Analytics aggregation and reporting and used Django configuration to manage URLs and application parameters.
  • Developed pre-processing pipelines for DICOM and NONDICOM Images.
  • Developed and presented analytical insights on medical data, image data.
  • Implement AWS Lambdas to drive real-time monitoring dashboards from system logs.
  • Cleansing the data for normal distribution by applying various techniques like missing value treatment, outlier treatment, and hypothesis testing.
  • Perform Data Cleaning, features scaling, features engineering using Pandas and NumPy packages in python.
  • Create several types of data visualizations using Python and Tableau.
  • Collected data needs and requirements by Interacting with the other departments.
  • Worked on different data formats such as JSON, XML.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Developed various graph methods to visualize and understand the data like Scatter plot, Pi-plot, bar charts, boxplot, and histograms.
  • Involved in development of Web Services using REST API’s for sending and getting data from the external interface in the JSON format.
  • Configured EC2 instances and configured IAM users and roles and created S3 data pipe using Boto API to load data from internal data sources.
  • Developed rest API's using python with flask framework and done the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.
  • Implemented Agile Methodology for building an internal application.
  • Developed A.I machine learning algorithms like Classification, Regression, and Deep Learning using python.
  • Conducted statistical analysis on Healthcare data using python and various tools.
  • Experience in cloud versioning technologies like Github.
  • Worked closely with Data Scientists to know data requirements for the experiments.
  • Deep experience in using DevOps technologies like Jenkins, Docker, Kubernetes etc.

Environment:: Pandas, OpenCV, NumPy, Seaborn, Tensor Flow, Keras, Matplotlib, Sci-kit-learn, NLTK, DevOps- Jenkins, Docker, Kubernetes, GitHub, Machine Learning Algorithms, Agile, rest API architecture, graph plots using Python libraries, data formats

We'd love your feedback!