Sr. Big Data/spark Developer Resume
4.00/5 (Submit Your Rating)
Philadelphia, PA
SUMMARY:
- Currently working in a Big Data Capacity with the help of Hadoop Eco System across internal and cloud - based platforms
- 6+ years of experience as Big Data/Hadoop with skills in analysis, design, development, testing and deploying various software applications
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa
- Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL)
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS
- Experience in developing custom UDF's for Pig and Apache Hive to in corporate methods and functionality of Java into PigLatin and HiveQL
- Good experience in developing MapReduce jobs in J2EE /Java for data cleansing, transformations, pre-processing and analysis
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2webservices which provides fast and efficient processing of Teradata BigData Analytics
- Experience in collection of LogData and JSON data into HDFS using Flume and processed the data using Hive/Pig
- Strong exposure to Web2.0 client technologies using JSP, JSTL, XHTML, HTML5, DOM, CSS3, JavaScript and AJAX
- Experience working with cloud platforms, setting up environments and applications on AWS, automation of code and infrastructure (DevOps) using Chef and Jenkins
- Extensive experience on developing Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used Spark SQL as required
- Experience on developing JAVA MapReduce jobs for data cleaning and data manipulation as required for the business
- Strong knowledge on Hadoop eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc
- Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate
- Expertise in JavaScript, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls
- Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR
- Extensive experience in working with Oracle, MSSQL Server, DB2, MySQL
- Experience working with Horton works and Cloudera environments
- Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required
- Excellent experience in installing and running various Oozie workflows and automating parallel job executions
- Experience on Spark and Spark SQL, Spark Streaming, Spark GraphX, Spark Mlib
- Extensively development experience in different IDE like Eclipse, Net Beans, IntelliJ and STS
- Strong experience in core SQL and Restful web services (RWS)
- Strong knowledge in NOSQL column-oriented databases like HBase and its integration with Hadoop cluster
- Good experience in Tableau for Data Visualization and analysis on large datasets, drawing various conclusions
- Experience in using Python, R for statistical analysis
- Good knowledge of coding using SQL, SQLPlus, T-SQL, PL/SQL, Stored Procedures/Functions
- Worked on Bootstrap, AngularJS and NodeJS, knockout, ember, Java Persistence Architecture (JPA)
- Experienced in developing applications using all Java/J2EE technologies like Servlets, JSP, EJB, JDBC, JNDI, JMS, SOAP, REST, GRAILS etc
- Well versed working with Relational Database Management Systems as Oracle12c, MSSQL, MySQL Server
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
WORK EXPERIENCE:
Confidential, Philadelphia, PA
Sr. Big Data/Spark Developer
Responsibilities:
- Involved in analysing business requirements and prepared detailed specifications that follow project guidelines required for project development
- Used Sqoop to import data from Relational Databases like MySQL, Oracle
- Involved in importing structured and unstructured data into HDFS
- Responsible for fetching real-time data using Kafka and processing using Spark and Scala
- Worked on Kafka to import real-time weblogs and ingested the data to Spark Streaming
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations
- Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API
- Worked on Hive to implement Web Interfacing and stored the data in Hive tables
- Migrated Map Reduce programs into Spark transformations using Spark and Scala
- Experienced with Spark Context, Spark-SQL, Spark YARN
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into a spark for faster processing of data
- Loaded the data into Spark RDD and do in-memory data Computation to generate the Output response
- Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS
- Involved in Data Querying and Summarization using Hive and Pig and created UDF's, UDAF's and UDTF's
- Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters
- Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs
- Developed traits and case classes etc. in Scala
- Developed Spark scripts using Scala shell commands as per the business requirement
- Worked on Cloudera distribution and deployed on AWS EC2 Instances
- Experienced in loading the real-time data to the NoSQL database like Cassandra
- Well versed in using Data Manipulations, Compactions, in Cassandra
- Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language)
- Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
- Deployed the project on Amazon EMR with S3 connectivity for setting backup storage
- Well versed in using of Elastic Load Balancer for Autoscaling in EC2 servers
- Configured workflows that involve Hadoop actions using Oozie
- Used Python for pattern matching in build logs to format warnings and errors
- Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint
Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cassandra, Cloudera, Oracle 10g, Linux.
Confidential, NJ
Hadoop Developer
Responsibilities:
- Worked on analyzing the Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop
- Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams
- Worked on NoSQL (HBase) for support enterprise production and loading data into HBASE using Impala and SQOOP
- Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS
- Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
- Architect, Design and develop Hadoop ETL by using Kafka
- Support REST-Based ETL Hadoop software in higher environments like UAT, Production
- Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications
- Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's
- Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database
- Configured Hive metastore with MySQL, which stores the metadata for Hive tables
- Created tables in HBase to store variable data formats of PII data coming from different portfolios
- Involved in identifying job dependencies to design workflow for Oozie & YARN resource management
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation
- Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP
- Implemented Map Reduce jobs in HIVE by querying the available data
- Performance tuning of Hive queries, MapReduce programs for different applications
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
- Used Cloudera Manager for installation and management of Hadoop Cluster
- Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis
- Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming
- Integrated Kafka-Spark streaming for high efficiency throughput and reliability
- Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts
Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Oracle 12c, Flume, Oozie, HBase, Impala, Spark Streaming, Yarn, Eclipse, spring, PL/SQL, UNIX Shell Scripting, Cloudera, BitBucket.
Confidential
Jr.JAVA Developer
Responsibilities:
- Involved in the analysis, design, implementation, and testing of the project
- Implemented the presentation layer with HTML, XHTML and JavaScript
- Developed web components using JSP, Servlets, and JDBC
- Designed tables and indexes
- Extensively worked on JUnit for testing the application code of server-client data transferring
- Developed and enhanced products in design and in alignment with business objectives
- Used SVN as a repository for managing/deploying application code
- Involved in the system integration and user acceptance tests successfully
- Developed front end using JSTL, JSP, HTML, and JavaScript
- Wrote complex SQL queries and stored procedures
- Involved in fixing bugs and unit testing with test cases using Junit
- Actively involved in system testing
- Involved in implementing service layer using Spring IOC module
- Prepared the Installation, Customer guide, and Configuration document which were delivered to the customer along with the product
Environment: Java, JSP, JSTL, HTML, JAVAScript, Servlets, JDBC, JavaScript, MySQL, JUnit, Eclipse IDE.