Big Data Developer Resume

SUMMARY:

A versatile professional offering over 4 years of rich technology and business experience with high interest and core competencies in Big Data and core strengths in Retail banking, Product development and Telecommunication.

Design and document the architecture and development process to convert database anddatawarehouse models into Hadoop based systems
Experience in loading data coming from different data sources into HDFS and automate data ingestion and transformation jobs
Performeddataextraction anddatawrangling using Pandas and NumPy modules in Python
Developed predictive analytics to generate business insights
Queried data from SQL datasources to build visualization and dashboards in Tableau
Creating Hive tables, dynamic partitioning, bucketing, performance tuning and querying them using HiveQL
Good knowledge on Spark In - memory capabilities and its modules: Spark Core, Spark SQL, Spark Streaming, MLlib
Experience in developing Spark jobs using PySpark API
Programmed in Hive, Spark SQL and Python to streamline the incomingdataand build thedatapipelines to get the useful insights
Working knowledge of streaming applications and scheduling workflows
Ability to work under pressure and adapt to constantly changing work environment
Possess good communication, analytical and organizational skills and able to multi-task efficiently
Self-motivated, excellent team player and ability to work independently as well

TECHNICAL SKILLS:

Programming: Python, Scala, R

Python - Data manipulation, Numpy,: Pandas, Matplotlib, Plotly, Scikitlearn

RDBMS: MySQL, Oracle, PostgreSQL

NoSQL: MongoDB, Cassandra, Hbase

Methodologies: Agile

Big Data: Hadoop, HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Apache Airflow, Kafka, Oozie

Spark: Spark Core, Spark SQL, Spark Streaming, PySpark, Scala

Analytical Tools: SQL, Jupyter Notebook, Tableau, Zeppelin

Others: AWS, TWS, Shell Script

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Developer

Responsibilities:

Design and implement data ingestion techniques for data coming from various source systems
Developed python scripts to collect data from source systems and store it on HDFS to run analytics
Creating the tables in Hive and integrating data between Hive &Spark
Created Hive Partitioned and Bucketed tables to improve performance
Created Hive tables with User defined functions
Worked on the core and Spark SQL modules of Spark extensively using Python
Performed extensive studies of different technologies and capture metrics by running different algorithms
Defining the data layouts and rules after consultation with ETL teams
Worked in AGILE environment and participated in daily Stand-ups/Scrum Meetings
Assisted business by reducing elapsed time of applications and hence time to market is reduced by streaming the applications

Environment: Hadoop, Map Reduce, HDFS, Hive, Spark, SQL, Sqoop, Python, Airflow, NoSQL

Confidential

Big Data Developer

Responsibilities:

Ingesting the data from Oracle to Hive and vice-versa using Sqoop
Ingesting the data from Mainframe to Hive for analytics consumption
Proof of concepts on Hive file systems
Created Hive tables with UDF’s and dynamic partitioning, bucketing tables to improve performance
Worked on Spark core and Spark SQL modules using PySpark API
Worked with Spark RDD's and Data frames to query and fetch Hive data
Worked with multipledataformats and Hadoop file formats like Avro, Parquet, ORC, and JSON etc.
Involved in code review and bug fixing for improving the performance
Experience working in AGILE environment and participated in Scrum Meetings
Completed big data analytics proof of concepts
Completed data ingestion and prepared data for analytics applications consumption

Environment: Hadoop, Map Reduce, HDFS, Hive, Spark, Pig, Sqoop, SQL, Python, Oracle, Kafka, Oozie, Tableau

Confidential

Python Developer

Responsibilities:

Involved in the Design, development, test, deploy and maintenance of the website
Data Analysis using python libraries
Design and develop the Python code as per user requirements
Debugging Software for Bugs and improving the performance
Helped creating data lakes in Big Data environment as part of new strategic initiatives
Automated the process in Python to create the Hive tables and Data ingestion processes
s:The critical gaps in the system were fixed
Migrating data from existing traditional data warehouse to Hadoop platform

Environment: Python, HTML, CSS, SQL, PLSQL, Oracle and Windows