We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Wooster, OH

OBJECTIVE:

Versatile Professional: Offering Over 5+ years of experience with core strengths in Banking and Insurance industry

PROFILE SUMMARY:

  • Designed and implemented data ingestion techniques for real time data coming from various data sources
  • Key Big Data Competencies:
  • Spark: Spark Core, Spark SQL, Spark Streaming
  • Data Collection and exploration (Python) + Data Visualization
  • Hive performance tuning
  • Productionizing Big Data Applications
  • PySpark API’s working knowledge
  • Experience in loading data coming from different data sources into HDFS and automate data ingestion and transformation jobs
  • Performed data extraction and data wrangling using Pandas and Numpy modules in Python
  • Programmed in Hive, Spark SQL and Python to streamline the incoming data and build the data pipelines to get the useful insights.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume
  • Possess good communication, interpersonal, analytical skills and a go - getter personality

KEY SKILLS:

  • Data Wrangling
  • Data Visualization
  • Big Data
  • Spark
  • Hive
  • PySpark
  • Kafka
  • Scala
  • NoSQL
  • AWS
  • Retail Banking
  • Insurance
  • Performance Engineering

TECHNICAL SKILLS:

Analytical Tools: SQL, Jupyter Notebook, Tableau, Zeppelin, Graph Database, Talend

Programming: Python, Python - Data Manipulation, Numpy, Pandas, Matplotlib, Plotly

Big Data: Spark, Pig, Hive, Sqoop, HBase, Hadoop, HDFS, MapReduce Spark - Spark Core, Spark SQL, Spark Streaming, PySpark

NoSQL: Cassandra, MangoDB

Methodologies: Agile and Waterfall model

Others: TWS, Shell Script

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Engineer, Wooster, OH

Responsibilities:

  • Defining the requirements for data lakes/pipe lines
  • Creating the tables in Hive and integrating data between Hive & Spark
  • Extensively worked on different file formats like PARQUET, AVRO & ORC
  • Used Sqoop to transfer data between RDBMS and HDFS.
  • Extensively worked on IMPALA
  • Developed python scripts to collect data from source systems and store it on HDFS to run analytics
  • Created Hive Partitioned and Bucketed tables to improve performance
  • Created Hive tables with User defined functions
  • Extensively worked Unix Shell scripts to perform Hadoop ETL functions like Sqoop, create external/internal Hive tables, initiate HQL scripts
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX , NoSQL and a variety of portfolios
  • Worked on the core and Spark SQL modules of Spark extensively using programming languages like Scala, Python
  • Hands on experience working with simple storage service in AWS and storing the transformed data into S3
  • Working knowledge of Amazon database RedShift and NoSQL database Cassandra.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON
  • Design and implement data ingestion techniques for real time data coming from various source systems
  • Defining the data layouts and rules and after consultation with ETL teams
  • Worked in aggressive AGILE environment and participated in daily Stand-ups/Scrum Meetings

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Impala, spark, Kafka, Flume, Linux, Scala, Python, Maven, Oracle 11g/10g

Confidential

Big Data Engineer

Responsibilities:

  • Data analysis using open source tools
  • Design and implement data ingestion techniques for real time data coming from various source systems
  • Developed Hive UDFs for manipulating the data according to Business Requirements
  • Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper
  • Worked on developing ETL processes (Data Stage Open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Created internal and external Hive tables and defined static and dynamic partitions for optimized performance
  • Knowledge on handling Hive queries using Spark SQL that integrate Spark environment
  • Defining the data layouts and rules and after consultation with ETL teams
  • Written Hive queries for data analysis to meet the Business requirements.
  • Experience in managing and reviewing Hadoop log files.
  • Performed unit testing using JUnit testing framework and used Log4j to monitor the error log
  • Worked in aggressive AGILE environment and participated in daily Stand-ups/Scrum Meetings

Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Hive, Shell Scripting

Confidential

Python Developer

Responsibilities:

  • Involved in the Design, development, test, deploy and maintenance of the website
  • Data Analysis using python libraries
  • Design and develop the Python code as per user requirements
  • Debugging Software for Bugs. Environment: Python, DOM, HTML, CSS, SQL, PLSQL, Oracle and Windows
  • The critical gaps in the system were fixed

Environment: : Python, DOM, HTML, CSS, SQL, PLSQL, Oracle and Windows

Confidential

Python Developer

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering
  • Modelling, analysis, design and development.
  • Design, develop, test, deploy and maintain the website.
  • Designed and developed data management system using MySQL.
  • Rewrite existing Python/Django module to deliver certain format of data.
  • Wrote python scripts to parse XML documents and load the data in database.
  • Generated Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design

Environment: Python, Shell scripting, PL/SQL, Oracle, SVN, Quality Center, Windows, Perl.

Hire Now