Big Data Developer Resume
SUMMARY:
- A versatile professional offering over 4 years of rich technology and business experience with high interest and core competencies in Big Data and core strengths in Retail banking, Product development and Telecommunication.
- Design and document the architecture and development process to convert database anddatawarehouse models into Hadoop based systems
- Experience in loading data coming from different data sources into HDFS and automate data ingestion and transformation jobs
- Performeddataextraction anddatawrangling using Pandas and NumPy modules in Python
- Developed predictive analytics to generate business insights
- Queried data from SQL datasources to build visualization and dashboards in Tableau
- Creating Hive tables, dynamic partitioning, bucketing, performance tuning and querying them using HiveQL
- Good knowledge on Spark In - memory capabilities and its modules: Spark Core, Spark SQL, Spark Streaming, MLlib
- Experience in developing Spark jobs using PySpark API
- Programmed in Hive, Spark SQL and Python to streamline the incomingdataand build thedatapipelines to get the useful insights
- Working knowledge of streaming applications and scheduling workflows
- Ability to work under pressure and adapt to constantly changing work environment
- Possess good communication, analytical and organizational skills and able to multi-task efficiently
- Self-motivated, excellent team player and ability to work independently as well
TECHNICAL SKILLS:
Programming: Python, Scala, R
Python - Data manipulation, Numpy,: Pandas, Matplotlib, Plotly, Scikitlearn
RDBMS: MySQL, Oracle, PostgreSQL
NoSQL: MongoDB, Cassandra, Hbase
Methodologies: Agile
Big Data: Hadoop, HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Apache Airflow, Kafka, Oozie
Spark: Spark Core, Spark SQL, Spark Streaming, PySpark, Scala
Analytical Tools: SQL, Jupyter Notebook, Tableau, Zeppelin
Others: AWS, TWS, Shell Script
PROFESSIONAL EXPERIENCE:
Confidential
Big Data Developer
Responsibilities:
- Design and implement data ingestion techniques for data coming from various source systems
- Developed python scripts to collect data from source systems and store it on HDFS to run analytics
- Creating the tables in Hive and integrating data between Hive &Spark
- Created Hive Partitioned and Bucketed tables to improve performance
- Created Hive tables with User defined functions
- Worked on the core and Spark SQL modules of Spark extensively using Python
- Performed extensive studies of different technologies and capture metrics by running different algorithms
- Defining the data layouts and rules after consultation with ETL teams
- Worked in AGILE environment and participated in daily Stand-ups/Scrum Meetings
- Assisted business by reducing elapsed time of applications and hence time to market is reduced by streaming the applications
Environment: Hadoop, Map Reduce, HDFS, Hive, Spark, SQL, Sqoop, Python, Airflow, NoSQL
Confidential
Big Data Developer
Responsibilities:
- Ingesting the data from Oracle to Hive and vice-versa using Sqoop
- Ingesting the data from Mainframe to Hive for analytics consumption
- Proof of concepts on Hive file systems
- Created Hive tables with UDF’s and dynamic partitioning, bucketing tables to improve performance
- Worked on Spark core and Spark SQL modules using PySpark API
- Worked with Spark RDD's and Data frames to query and fetch Hive data
- Worked with multipledataformats and Hadoop file formats like Avro, Parquet, ORC, and JSON etc.
- Involved in code review and bug fixing for improving the performance
- Experience working in AGILE environment and participated in Scrum Meetings
- Completed big data analytics proof of concepts
- Completed data ingestion and prepared data for analytics applications consumption
Environment: Hadoop, Map Reduce, HDFS, Hive, Spark, Pig, Sqoop, SQL, Python, Oracle, Kafka, Oozie, Tableau
Confidential
Python Developer
Responsibilities:
- Involved in the Design, development, test, deploy and maintenance of the website
- Data Analysis using python libraries
- Design and develop the Python code as per user requirements
- Debugging Software for Bugs and improving the performance
- Helped creating data lakes in Big Data environment as part of new strategic initiatives
- Automated the process in Python to create the Hive tables and Data ingestion processes
- s:The critical gaps in the system were fixed
- Migrating data from existing traditional data warehouse to Hadoop platform
Environment: Python, HTML, CSS, SQL, PLSQL, Oracle and Windows
