We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY

  • Over 7+ years of professional experience in application development coding in Python and Amazon Web Services (AWS), Big Data and ETL technologies.
  • Strong experience of software development in Python (libraries: NumPy, Scikit learn, Matplotlib, Pandas, here api, Selenium) and IDEs - VS Code, PyCharm, Jupyter notebook.
  • Hands-on experience in functional testing and testing frameworks- Pytest and PyUnit.
  • Experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Hive, Sqoop, and Spark).
  • Experience in AWS EC2, configuring the servers for Auto scaling and Single node applications.
  • Good working experience on Spark (spark streaming, spark SQL), pyspark, Kafka, snowflake, and Teradata.
  • Experience in development and design of various scalable systems using Hadoop technologies.
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark Streaming and Spark SQL.
  • Strong experience and knowledge of real time data analytics using Spark Streaming.
  • Working knowledge and experience of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Excellent working experience on Big Data Integration and Analytics based on Hadoop, Spark, Kafka and web mining technologies.
  • Experienced in designing and developing applications in Spark using pyspark to compare the performance of Spark with Hive and SQL/Oracle.
  • Expert in understanding the data, designing, and implementing the enterprise platforms like Hadoop Data Lake and Huge Data warehouses.
  • Hands on experience working on NoSQL databases including HBase, MongoDB, Cassandra, and its integration with Hadoop cluster.
  • Experience in designing, developing, and deploying projects in GCP suite including GCP Suite such as Big Query, Data Flow, Data proc, Google Cloud Storage, Composer, and Looker etc.
  • Good experience in designing the data flow for the collapse of 4 legacy data warehouses into an AWS Data Lake.
  • Expertise in converting AWS existing infrastructure to server less architecture (AWS Lambda) and deployed via Jenkins and AWS Cloud formation.
  • Strong Knowledge and experience on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
  • Used NoSQL Database including HBase, MongoDB, and Cassandra.
  • Having good experience on all flavors of Hadoop (Cloudera, Hortonworks etc.)
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Kafka.
  • Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and YARN.
  • Efficient in writing MapReduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
  • Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing, and optimizing the HiveQL queries.
  • Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.
  • Experience of job workflow scheduling and monitoring tools like Oozie (hive) and airflow.

TECHNICAL SKILLS

Languages: Python

Python Packages & Libraries: NumPy, SciPy, matplotlib, pandas, scikit-learn, beautiful soup, folium, pytorch, PyUnit, GCP, Pytest

Framework/Tools: Flask, Django, Robot

Databases and Cloud: MySQL, PostgreSQL, Mondo DB, AWS

Web Technologies: HTML, CSS, Bootstrap

Tools: & IDE: Visual Studio Code, Jupyter Notebook, PyCharm, Git, Office Suite

Operating Systems: Linux, Mac, Windows

PROFESSIONAL EXPERIENCE

Confidential - Chicago, IL

Data Engineer

Responsibilities:

  • Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
  • Implemented solutions for ingesting data from various sources utilizing Big Data technologies such as Hadoop, Kafka, Map Reduce Frameworks, Hive
  • Developed the PySpark applications for Spark SQL, Data frames and transformations using Python APIs to perform the business requirement on Hive external tables and load the final transformed data into Hive managed tables.
  • Worked in Hadoop technologies like Map Reduce with Hive, hdfs, spark.
  • Involved in ingesting large volumes of credit data from multiple provider data sources to AWS S3. Created modular and independent components for AWS S3 connections, data reads.
  • Implemented Data warehouse solutions in AWS Redshift by migrating the data to Redshift from S3.
  • Developed Spark code using Python to run in the EMR clusters.
  • Created User Defined Functions (UDF) using pyspark to automate some business logic in the applications.
  • Automated the jobs and data pipelines using AWS Lambda and configured various performance metrics using AWS Cloud watch.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Kafka, and Map Reduce.
  • Designed AWS Glue pipelines to ingest, process, and store data interacting with different services in AWS.
  • Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Executed the program by using python API written in python to support Apache Spark or PySpark.
  • Helped Dev ops Engineers for deploying code and debug issues.
  • Worked in writing Hadoop Jobs for analyzing data like Text format files, sequence files, Parquet files using Hive and spark.
  • Worked on analyzing Hadoop cluster and different Big Data components including Hive, Spark.
  • Populated database tables via AWS Cassandra and AWS Redshift.
  • Developed Spark code using Python and Spark-SQL for faster testing and data processing.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Developed ETL modules and data workflows for solution accelerators using PySpark and Spark SQL.
  • Used Spark SQL to process the huge amount of structured data.
  • Extracted the data from MySQL and AWS RedShift into HDFS using Kinesis.
  • Developed Pyspark application for creating reporting tables with different masking in both Hive and MySQL DB and made available for newly build fetch API’s.
  • Wrote numerous Spark code in pyspark for information extraction, manipulation from numerous record designs.
  • Supported Kafka integrations, including topics, producers, consumers, Schema Registry, Kafka Control Center, KSQL, and streaming applications.

Environment: Big Data, Spark, Hive, Pig, Python, Hadoop, AWS, Databases, AWS RedShift, Agile, SQL, HQL, Impala, CloudWatch, AWS Kinesis, Kafka

Confidential - Jacksonville, FL

Data Engineer

Responsibilities:

  • Involved in the project life cycle including design, development and implementation and verification and validation.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Engaged in solving and supporting real business issues with your Hadoop distributed file systems and open source knowledge.
  • Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
  • Involved in various phases of development analyzed and developed the system going through Agile scrum methodology.
  • Performed detail analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
  • Build a program with python and apache beam and execute it in cloud Data flow to run Data Validation between raw source file and Big Query tables.
  • Built the data pipelines that will enable faster, better, data informed decision making within the business.
  • Used Rest API with Python to ingest data from other site to Big Query.
  • Loaded and transformed large sets of structured, semi-structured and unstructured data using Hadoop/Big Data concepts.
  • Performed Data transformations in Hive and used Partitions, buckets for performance improvements.
  • Optimized Hive queries extract the customer information from HDFS.
  • Strong experience working with ELASTIC MAPREDUCE (EMR) and setting up environments on Amazon AWS EC2 instances.
  • Lead Architecture and design of data processing, ware housing and analytics initiatives.
  • Design and develop ETL Processes in data bricks to migrate Campaign data from external sources like S3, ORC/Parquet/Text files into AWS Redshift.
  • Extensively utilized Python frameworks like pandas, spark, PyUnit and libraries like matplotlib, Numpy and Pandas.
  • Created a Python application using Python scripting for data processing, MySQL for the database, and matplotlib for data visualization of sales, tracking progress, identifying trends.
  • Fetched twitter feeds by important keyword using python-twitter library (Tweepy), Used JSON to store the twitter data, which was further represented using matplotlib visualization and generated graphical reports for business decision using matplotlib library.
  • Participate in the design, build and deployment of NoSQL implementations in MongoDB.
  • Added support for AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
  • Worked on creation of custom Docker container images, tagging, and pushing the images.
  • Integrated to Google Cloud platform for large-scale computing workloads.
  • Worked with RDBMS like Oracle and MySQL databases to query and read data.
  • Implemented continuous integration using Jenkins and involved in the deployment of application with Ansible automation engine.
  • Performed Unit testing, Integration Testing, GUI testing using PyTest and web application testing using Selenium Python bindings.
  • Logged user stories and acceptance criteria in JIRA for features by evaluating output requirements and formats.

Environment: Python, Hadoop, PostgreSQL, T-SQL, MongoDB, Docker, Oracle 11g/10i, Ansible, MySQL, Google Cloud, Amazon AWS S3, JIRA.

Confidential, Richmond, VA.

Data Engineer

Responsibilities:

  • Designed database Model, APIs and Views utilizing Python to build an interactive web - based application.
  • Thoroughly used Python Libraries Beautiful Soup, NumPy, Pandas data frame, Matplotlib, python-twitter, urllib2.
  • Developed new functionality keeping in mind OOP Principles, performance, scalability, and robustness deployed within Docker containers.
  • Used python libraries like Beautiful Soap, NumPy and SQLAlchemy and Wrote Python scripts to parse JSON documents and load the data in database.
  • Designed and managed API system deployment using fast http server and Amazon AWS architecture.
  • Worked in DevOps group running Jenkins in a Docker container with EC2 slaves in AWS cloud configuration. Also gained familiarity with supporting technologies like Kubernetes and Mesos.
  • Build Cassandra queries for performing various CRUD operations like create, update, read and delete, also used Bootstrap as a mechanism to manage and organize the HTML page layout.
  • Used Ansible, and Docker for managing the application environments.
  • Used regular expression for faster search results in combination with Angular2 built-in and custom pipes.
  • Developed backend web services using Node.js and stored dependencies using Node Package Manager (NPM).
  • Managed, developed, and designed a dashboard control panel for customers and Administrators using Django, Oracle DB, PostgreSQL, and VMWare API calls.
  • Used SOAP and Restful API for information extraction.
  • Added support for AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
  • Automated the existing scripts for performance calculations using NumPy and SQL Alchemy.
  • Used Git source control to manage simultaneous development.
  • Used locust.io for load testing for 10000 per second. Performed the performance testing as per client requirement.
  • Created the Python Django forms to record data of online users and used PyTest for writing test cases. Also wrote test summary reports.
  • Experience with continuous integration and automation using Jenkins.

Environment: Python, PyQuery, HTML5, CSS3, Angular 2, Shell Scripting, JSON, Rest, Apache Web Server, Django, Celery, Flask, SQL, UNIX, Windows, PostgreSQL, and python libraries such as Numpy, SQL Alchemy, AWS.

Confidential

Software Engineer

Responsibilities:

  • Participated in the development of application architecture and blueprints to define application components, platforms, interfaces, and development tools.
  • Developed an application for finding the frequent patterns in a transactional database.
  • Involved in development of the front end and back end of the application.
  • Used PyQt for front-end development.
  • Used Frequent Pattern Growth algorithm to find the frequent patterns and get the Association Rules in the transactional database.
  • Implemented Python modules for unit testing using PyUnit and created test harness to enable compressive testing.
  • Worked in development of application’s front-end and back-end of employee performance evaluation system.
  • Involved in the project life cycle including design, development and implementation and verification And validation.
  • Wrote and executed various MySQL database queries from Python using Python-MySQL connector and MySQL dB package.
  • Implemented modules to compare data fromSQL databaseandNoSQLMongoDB.
  • Implemented modules to test python functions using PyUnit and scrapped records from a website using beautiful soup.
  • Implemented python modules to analyze the data using NumPy.
  • Exchanged data withSQL databaseand MongoDB.
  • Worked with JSON based REST Web services and Amazon Web Services (AWS).
  • Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format.
  • Generatedweb forms to record data and used JavaScript for validation of inputs.
  • Parsed XML file usingPythonto extract data from database.

Environment: Python, MySQL, NumPy, SciPy, Pandas API, PEP, PIP, Jenkins, JSON, Git, JavaScript, AJAX, RESTful webservice, MySQL, PyUnit.

We'd love your feedback!