Big Data Engineer Resume Wooster, OH - Hire IT People

OBJECTIVE:

Versatile Professional: Offering Over 5+ years of experience with core strengths in Banking and Insurance industry

PROFILE SUMMARY:

Designed and implemented data ingestion techniques for real time data coming from various data sources
Key Big Data Competencies:
Spark: Spark Core, Spark SQL, Spark Streaming
Data Collection and exploration (Python) + Data Visualization
Hive performance tuning
Productionizing Big Data Applications
PySpark API’s working knowledge
Experience in loading data coming from different data sources into HDFS and automate data ingestion and transformation jobs
Performed data extraction and data wrangling using Pandas and Numpy modules in Python
Programmed in Hive, Spark SQL and Python to streamline the incoming data and build the data pipelines to get the useful insights.
Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume
Possess good communication, interpersonal, analytical skills and a go - getter personality

KEY SKILLS:

TECHNICAL SKILLS:

Analytical Tools: SQL, Jupyter Notebook, Tableau, Zeppelin, Graph Database, Talend

Programming: Python, Python - Data Manipulation, Numpy, Pandas, Matplotlib, Plotly

Big Data: Spark, Pig, Hive, Sqoop, HBase, Hadoop, HDFS, MapReduce Spark - Spark Core, Spark SQL, Spark Streaming, PySpark

NoSQL: Cassandra, MangoDB

Methodologies: Agile and Waterfall model

Others: TWS, Shell Script

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Engineer, Wooster, OH

Responsibilities:

Defining the requirements for data lakes/pipe lines
Creating the tables in Hive and integrating data between Hive & Spark
Extensively worked on different file formats like PARQUET, AVRO & ORC
Used Sqoop to transfer data between RDBMS and HDFS.
Extensively worked on IMPALA
Developed python scripts to collect data from source systems and store it on HDFS to run analytics
Created Hive Partitioned and Bucketed tables to improve performance
Created Hive tables with User defined functions
Extensively worked Unix Shell scripts to perform Hadoop ETL functions like Sqoop, create external/internal Hive tables, initiate HQL scripts
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX , NoSQL and a variety of portfolios
Worked on the core and Spark SQL modules of Spark extensively using programming languages like Scala, Python
Hands on experience working with simple storage service in AWS and storing the transformed data into S3
Working knowledge of Amazon database RedShift and NoSQL database Cassandra.
Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON
Design and implement data ingestion techniques for real time data coming from various source systems
Defining the data layouts and rules and after consultation with ETL teams
Worked in aggressive AGILE environment and participated in daily Stand-ups/Scrum Meetings

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Impala, spark, Kafka, Flume, Linux, Scala, Python, Maven, Oracle 11g/10g

Confidential

Big Data Engineer

Responsibilities:

Data analysis using open source tools
Design and implement data ingestion techniques for real time data coming from various source systems
Developed Hive UDFs for manipulating the data according to Business Requirements
Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper
Worked on developing ETL processes (Data Stage Open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP
Importing and exporting data into HDFS and Hive using Sqoop
Created internal and external Hive tables and defined static and dynamic partitions for optimized performance
Knowledge on handling Hive queries using Spark SQL that integrate Spark environment
Defining the data layouts and rules and after consultation with ETL teams
Written Hive queries for data analysis to meet the Business requirements.
Experience in managing and reviewing Hadoop log files.
Performed unit testing using JUnit testing framework and used Log4j to monitor the error log
Worked in aggressive AGILE environment and participated in daily Stand-ups/Scrum Meetings

Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Hive, Shell Scripting

Confidential

Python Developer

Responsibilities:

Involved in the Design, development, test, deploy and maintenance of the website
Data Analysis using python libraries
Design and develop the Python code as per user requirements
Debugging Software for Bugs. Environment: Python, DOM, HTML, CSS, SQL, PLSQL, Oracle and Windows
The critical gaps in the system were fixed

Environment: : Python, DOM, HTML, CSS, SQL, PLSQL, Oracle and Windows

Confidential

Python Developer

Responsibilities:

Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering
Modelling, analysis, design and development.
Design, develop, test, deploy and maintain the website.
Designed and developed data management system using MySQL.
Rewrite existing Python/Django module to deliver certain format of data.
Wrote python scripts to parse XML documents and load the data in database.
Generated Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design

Environment: Python, Shell scripting, PL/SQL, Oracle, SVN, Quality Center, Windows, Perl.