Big Data Engineer Resume
Wooster, OH
OBJECTIVE:
Versatile Professional: Offering Over 5+ years of experience with core strengths in Banking and Insurance industry
PROFILE SUMMARY:
- Designed and implemented data ingestion techniques for real time data coming from various data sources
- Key Big Data Competencies:
- Spark: Spark Core, Spark SQL, Spark Streaming
- Data Collection and exploration (Python) + Data Visualization
- Hive performance tuning
- Productionizing Big Data Applications
- PySpark API’s working knowledge
- Experience in loading data coming from different data sources into HDFS and automate data ingestion and transformation jobs
- Performed data extraction and data wrangling using Pandas and Numpy modules in Python
- Programmed in Hive, Spark SQL and Python to streamline the incoming data and build the data pipelines to get the useful insights.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume
- Possess good communication, interpersonal, analytical skills and a go - getter personality
KEY SKILLS:
- Data Wrangling
- Data Visualization
- Big Data
- Spark
- Hive
- PySpark
- Kafka
- Scala
- NoSQL
- AWS
- Retail Banking
- Insurance
- Performance Engineering
TECHNICAL SKILLS:
Analytical Tools: SQL, Jupyter Notebook, Tableau, Zeppelin, Graph Database, Talend
Programming: Python, Python - Data Manipulation, Numpy, Pandas, Matplotlib, Plotly
Big Data: Spark, Pig, Hive, Sqoop, HBase, Hadoop, HDFS, MapReduce Spark - Spark Core, Spark SQL, Spark Streaming, PySpark
NoSQL: Cassandra, MangoDB
Methodologies: Agile and Waterfall model
Others: TWS, Shell Script
PROFESSIONAL EXPERIENCE:
Confidential
Big Data Engineer, Wooster, OH
Responsibilities:
- Defining the requirements for data lakes/pipe lines
- Creating the tables in Hive and integrating data between Hive & Spark
- Extensively worked on different file formats like PARQUET, AVRO & ORC
- Used Sqoop to transfer data between RDBMS and HDFS.
- Extensively worked on IMPALA
- Developed python scripts to collect data from source systems and store it on HDFS to run analytics
- Created Hive Partitioned and Bucketed tables to improve performance
- Created Hive tables with User defined functions
- Extensively worked Unix Shell scripts to perform Hadoop ETL functions like Sqoop, create external/internal Hive tables, initiate HQL scripts
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX , NoSQL and a variety of portfolios
- Worked on the core and Spark SQL modules of Spark extensively using programming languages like Scala, Python
- Hands on experience working with simple storage service in AWS and storing the transformed data into S3
- Working knowledge of Amazon database RedShift and NoSQL database Cassandra.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON
- Design and implement data ingestion techniques for real time data coming from various source systems
- Defining the data layouts and rules and after consultation with ETL teams
- Worked in aggressive AGILE environment and participated in daily Stand-ups/Scrum Meetings
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Impala, spark, Kafka, Flume, Linux, Scala, Python, Maven, Oracle 11g/10g
Confidential
Big Data Engineer
Responsibilities:
- Data analysis using open source tools
- Design and implement data ingestion techniques for real time data coming from various source systems
- Developed Hive UDFs for manipulating the data according to Business Requirements
- Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper
- Worked on developing ETL processes (Data Stage Open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP
- Importing and exporting data into HDFS and Hive using Sqoop
- Created internal and external Hive tables and defined static and dynamic partitions for optimized performance
- Knowledge on handling Hive queries using Spark SQL that integrate Spark environment
- Defining the data layouts and rules and after consultation with ETL teams
- Written Hive queries for data analysis to meet the Business requirements.
- Experience in managing and reviewing Hadoop log files.
- Performed unit testing using JUnit testing framework and used Log4j to monitor the error log
- Worked in aggressive AGILE environment and participated in daily Stand-ups/Scrum Meetings
Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Hive, Shell Scripting
Confidential
Python Developer
Responsibilities:
- Involved in the Design, development, test, deploy and maintenance of the website
- Data Analysis using python libraries
- Design and develop the Python code as per user requirements
- Debugging Software for Bugs. Environment: Python, DOM, HTML, CSS, SQL, PLSQL, Oracle and Windows
- The critical gaps in the system were fixed
Environment: : Python, DOM, HTML, CSS, SQL, PLSQL, Oracle and Windows
Confidential
Python Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering
- Modelling, analysis, design and development.
- Design, develop, test, deploy and maintain the website.
- Designed and developed data management system using MySQL.
- Rewrite existing Python/Django module to deliver certain format of data.
- Wrote python scripts to parse XML documents and load the data in database.
- Generated Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design
Environment: Python, Shell scripting, PL/SQL, Oracle, SVN, Quality Center, Windows, Perl.
