Big Data Developer Resume

EXPERIENCE SUMMARY:

An astute IT professional with 15+ years of experience in IT Industry.
Around 5 years of Experience in Big Data Hadoop echo systems - Hadoop, Spark, Machine Learning, Scala Yarn, Pig, Hive, HBase, Akka, Oozie, Flume, Zoo Keeper.
Around 6 Years of Core Java, Python, Web Services Experience in traditional and distributed environments.
Experience with messaging & Complex event processing systems using Kafka and Spark Streaming.
Extensively worked on Spark Core, Spark Streaming and Spark SQL and Spark ML Modules using both Scala and Java.
Experience in importing and exporting the data using Sqoop/Flume from HDFS to Relational Database systems/Logs and vice-versa.
Good Knowledge on Hadoop Analytics using R.
Collected logs data from various sources and integrated in to HDFS using Flume.
Developed Oozie workflow for scheduling and orchestrating the ETL process.
Extensively worked with Kerberos secured clusters on Hadoop platform.
Worked with different data formats like ORC, Parquet, Avro etc.
Experience in multiple java technologies like JDBC, Servlets, Quartz Scheduler, EJBs, JNDI, JMS, Guava APIs, Apache Commons.
Experience in implementing MVC, Singleton, Session Facade, DAO, DTO, Front Controller, Business Delegate and Factory Method patterns.
Experience in developing and consuming Rest and Soap Web Services.
Worked on building continuous integration and test driven development environments.
Worked on Apache NIFI to stream line the data flow.
Good knowledge on Spark Mlib, Python.
Good Knowledge on AWS Cloud QuickSight, AWS S3, RedShift, AWS EMR and Athena and EC2.
Worked on Development, Enhancement, and production support on different technologies like Big data, Oracle, PeopleSoft and core Java, Rest and SOAP Web Services.
Have created Oracle Pl/sql procedures, functions and triggers.
Oracle Pl/sql trainer for High Tech Account employees.
Experience on High Availability and high traffic applications.
Worked on PeopleSoft Development.
Oracle 10g OCA DBA Certified. OCP DBA (Exam).
Excellent Knowledge of UNIX and shell scripting, Auto sys jobs creation.
Experience analyzing logs using Splunk queries.
Experienced in release management and release process improvements.
Strong analytical skills, disciplined delivery standards and dedication to develop and enjoy making high quality software systems.

TECHNICAL SKILLS:

Hardware: Red Hat Linux Servers

Operating System: Unix/Linux.

Languages/Tools: Cloudera, Hive, Fitnesse, Spark, Kafka, Scala, core java, Hbase, Python, Spring Web Services, oozie, AWS, Kerberos.

Special Software: Fitnesse, Jira, bitbucket, ctrl+m.

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Developer

Responsibilities:

Preparing project estimates, test plan, and task allocation to team members. application development and maintenance.
Involving in architectural engagements.
Content management and providing support.
Responsible for quality assurances and provide support.
Involving in feasibility studies, proof of concepts.
Perform analysis and understand the business process of the application and identify testing needs.
Creating automation regression bed.
Performing review of test design and monitor team member’s activities.
Contributing to process improvements.
Implementation of automation and performance testing tools.
Provide support for activities for existing /new application.
Implementing enhancements in Execution Plan Builder and Enrichment applications.
Integrating Execution Plan Builder with upgraded version Model Registry -2.
Implementing Scala Style checks and updating the code to maintain the Scala style standards.
Creating Fitnesse test cases to test the flow of end-to-end applications.
Designing and Developing One-to-One, One-to-Many, Logical and Mapping Phases at ingestion and enrichment of data.
Implementing Data mart along with analytical reporting capabilities in Hdfs.
Converting legacy applications to Big Data Systems.
Storing ticketing archival data in the Data Lake and generate Reports.
Implementing Data mart along with analytical reporting capabilities in Hdfs.
Developing Kafka publisher to fetch Sales, Product, Usage Events from Rabbit MQ to Kafka.
Transforming the business requirements into data-driven solutions and help customers to make the decisions based on data (Big Data-Driven Decision Making).
Developing Spark Streaming subscriber to fetch data from Kafka Topics to HDFS.
Developing Spark Programs for Data validations and to implement business logic and worked on Spark Tuning.
Analyzing and train large datasets using various machine learning algorithms to provide strategic directions.
Creating Hive External tables/views on top of Events data for reporting purpose.
Involving in discussions with Business to plan Report generation and execution.
Hive Performance Tuning on various parameters and configurations.
Hive Report generation for various applications.
Analyzing and implementing the right format like ORC, parquet, Snappy and mc4 compressions for data at various levels.
Implement Java APIs to store and validate the duplicate events using Hbase.
Storing and accessing kafka offsets on Hbase tables.
Developing analytics and built datasets using pyspark.
Creating Python scripts to connect hive through ODBC and run Hive queries.
Implementing Spark scheduler Jobs to concatenate small orc files.
Handling Data Engineering, Big data and Splunk Support Applications.
Application Maintenance for various environments for activities like Cloudera Up gradation etc.

Confidential

Application Developer

Responsibilities:

Creating Big Data pipeline using Kafka, Spark to ingest different events from VGS system into HDFS for analysis.
Converting existing Oracle Pl/sql Procedures to Spark Jobs.
Converting legacy applications to Big Data Systems.
Implementing Data mart along with analytical reporting capabilities in Hdfs.
Storing ticketing archival data in the Data Lake and generate Reports.
Implementing Data mart along with analytical reporting capabilities in Hdfs.
Developing Kafka publisher to fetch Sales, Product, Usage Events from Rabbit MQ to Kafka
Developing Spark Streaming subscriber to fetch data from Kafka Topics to HDFS.
Developing Spark Programs for Data validations and to implement business logic and worked on Spark Tuning.
Extensive experience of transforming the business requirements into data-driven solutions and help customers to make the decisions based on data(Big Data-Driven Decision Making)
Analyzed and trained large datasets using various machine learning algorithms to provide strategic directions
Creating Hive External tables/views on top of Events data for reporting purpose.
Discussions with Business to plan Report generation and execution.
Discussions with Architects to choose right technology for each requirement.
Hive Performance Tuning on various parameters and configurations.
Hive Report generation for Business Objects.
Extensively worked on parquet and ORC file formats.
Extensively used Snappy and mc4 compressions.
The environment is secured with Kerberos authentication, SSL/SSL SASL.
Java API to store and validate the duplicate events using Hbase
Storing and accessing kafka offsets on Hbase tables.
Developed Analytics and built datasets using pyspark.
Python scripts to connect hive through ODBC and run Hive queries.
Spark scheduler Jobs to concatenate small orc files.
Handling Data Engineering, Big data and Splunk Support Applications.
Onsite Technical Lead for the project.
Cloudera Up gradation planning and execution.
Big data Platform and application support.
Receiving and documenting incident and service requests via web tickets or phone calls or emails.
Grating User permissions and access to Systems or applications.
Gathering Monthly support metrics as defined by PU.
Provide necessary support for installing and upgrading application tools.
Monitoring dash boards and reviewing health of the systems on a daily basis.
Monitoring and optimizing the distributing applications in Big data.
Provide daily support and troubleshooting for test and production Splunk and Big data applications.
Monitor backups and fix any issues if there are problems.

Confidential, Monmouth Junction, NJ

Big data developer

Responsibilities:

Worked on Spark Core, Spark SQL for analyzing the raw data coming from enodebs.
Involving on designing the application.
Fixing the bugs and implementing enhancements.
Implementing Data Lake and data ingestion using Big data Technologies.
Supporting SIT and SVT.
Created Big Data pipeline, to fetch different source data from different sources like ENodes, LMS, Custer mobile log data to HDFS though various approaches like Flume, sqoop, Server copy etc.
Data retrieval from HDFS, hive using Spark Sql with Scala.
Implemented Spark SQL to access hive tables into spark for faster processing of data.
Implemented Flume data consumption APIS using Java.
Worked on Apache NIFI to stream line the data flow.
Developed a solution using Spark, Spark SQL using Scala for analyzing the intermediate data from HDFS and Mongo DB.
Created Flume interfaces and worked on Flume configuration to load the raw data into HDFS and Mongo DB, that is coming from enodebs, MMECSL and UEAPP and other publishers.
Used Sqoop for importing and exporting data from Oracle Database into HDFS and Hive.
Designed and Created Hive databases, tables and views. Worked on Hive Performance tuning and created UDFs.
Incremental data movement using sqoop and Oozie jobs.
Implemented Spark SQL to access hive tables into spark for faster processing of data.
Involving on designing the application.
Fixing the bugs and implementing enhancements.

Confidential

Senior Application Developer

Responsibilities:

Created data pipeline, to fetch different source data from different sources to HDFS though Kafka.
Incremental data movement using sqoop and Oozie jobs.
Worked on Hive Optimization.
Developed Spark Streaming APIs using scala.
Developed Kafka Producer Service in Java, using Kafka Publisher APIs.
Designed and developed real time event processing of data from multiple sources using Spark Streaming by integrating with Apache Kafka and Zookeeper, Flume, Akka.
Responsible for data movement from existing relational databases, Teradata, external websites to HDFS using sqoop and flume.
Have created Kafka producer and consumers using Java, oracle Pl/sql.
Developed Hive UDFs using Java.
Developed Oozie workflow for scheduling and orchestrating the ETL process
Production cutover activities and migration of the changes to Production.
Pyspark to develop ML algorithms.
Customer interactions on a daily basis - for Knowledge Transfer, status Reporting, clarifications.
Getting signoff from business users on completion of User Acceptance Testing phase of the project.
Business Discussions and Onshore, offshore co-ordination.

Confidential