Sr. Big Data/Spark Developer Resume Philadelphia, PA - Hire IT People

SUMMARY:

Currently working in a Big Data Capacity with the help of Hadoop Eco System across internal and cloud - based platforms
6+ years of experience as Big Data/Hadoop with skills in analysis, design, development, testing and deploying various software applications
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa
Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL)
Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS
Experience in developing custom UDF's for Pig and Apache Hive to in corporate methods and functionality of Java into PigLatin and HiveQL
Good experience in developing MapReduce jobs in J2EE /Java for data cleansing, transformations, pre-processing and analysis
Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2webservices which provides fast and efficient processing of Teradata BigData Analytics
Experience in collection of LogData and JSON data into HDFS using Flume and processed the data using Hive/Pig
Strong exposure to Web2.0 client technologies using JSP, JSTL, XHTML, HTML5, DOM, CSS3, JavaScript and AJAX
Experience working with cloud platforms, setting up environments and applications on AWS, automation of code and infrastructure (DevOps) using Chef and Jenkins
Extensive experience on developing Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used Spark SQL as required
Experience on developing JAVA MapReduce jobs for data cleaning and data manipulation as required for the business
Strong knowledge on Hadoop eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc
Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate
Expertise in JavaScript, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls
Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR
Extensive experience in working with Oracle, MSSQL Server, DB2, MySQL
Experience working with Horton works and Cloudera environments
Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required
Excellent experience in installing and running various Oozie workflows and automating parallel job executions
Experience on Spark and Spark SQL, Spark Streaming, Spark GraphX, Spark Mlib
Extensively development experience in different IDE like Eclipse, Net Beans, IntelliJ and STS
Strong experience in core SQL and Restful web services (RWS)
Strong knowledge in NOSQL column-oriented databases like HBase and its integration with Hadoop cluster
Good experience in Tableau for Data Visualization and analysis on large datasets, drawing various conclusions
Experience in using Python, R for statistical analysis
Good knowledge of coding using SQL, SQLPlus, T-SQL, PL/SQL, Stored Procedures/Functions
Worked on Bootstrap, AngularJS and NodeJS, knockout, ember, Java Persistence Architecture (JPA)
Experienced in developing applications using all Java/J2EE technologies like Servlets, JSP, EJB, JDBC, JNDI, JMS, SOAP, REST, GRAILS etc
Well versed working with Relational Database Management Systems as Oracle12c, MSSQL, MySQL Server
Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support
Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.

WORK EXPERIENCE:

Confidential, Philadelphia, PA

Sr. Big Data/Spark Developer

Responsibilities:

Involved in analysing business requirements and prepared detailed specifications that follow project guidelines required for project development
Used Sqoop to import data from Relational Databases like MySQL, Oracle
Involved in importing structured and unstructured data into HDFS
Responsible for fetching real-time data using Kafka and processing using Spark and Scala
Worked on Kafka to import real-time weblogs and ingested the data to Spark Streaming
Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations
Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API
Worked on Hive to implement Web Interfacing and stored the data in Hive tables
Migrated Map Reduce programs into Spark transformations using Spark and Scala
Experienced with Spark Context, Spark-SQL, Spark YARN
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into a spark for faster processing of data
Loaded the data into Spark RDD and do in-memory data Computation to generate the Output response
Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data
Implemented Hive Partitioning and Bucketing on the collected data in HDFS
Involved in Data Querying and Summarization using Hive and Pig and created UDF's, UDAF's and UDTF's
Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters
Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs
Developed traits and case classes etc. in Scala
Developed Spark scripts using Scala shell commands as per the business requirement
Worked on Cloudera distribution and deployed on AWS EC2 Instances
Experienced in loading the real-time data to the NoSQL database like Cassandra
Well versed in using Data Manipulations, Compactions, in Cassandra
Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language)
Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3
Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
Deployed the project on Amazon EMR with S3 connectivity for setting backup storage
Well versed in using of Elastic Load Balancer for Autoscaling in EC2 servers
Configured workflows that involve Hadoop actions using Oozie
Used Python for pattern matching in build logs to format warnings and errors
Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint

Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cassandra, Cloudera, Oracle 10g, Linux.

Confidential, NJ

Hadoop Developer

Responsibilities:

Worked on analyzing the Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop
Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams
Worked on NoSQL (HBase) for support enterprise production and loading data into HBASE using Impala and SQOOP
Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS
Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
Architect, Design and develop Hadoop ETL by using Kafka
Support REST-Based ETL Hadoop software in higher environments like UAT, Production
Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing
Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications
Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's
Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database
Configured Hive metastore with MySQL, which stores the metadata for Hive tables
Created tables in HBase to store variable data formats of PII data coming from different portfolios
Involved in identifying job dependencies to design workflow for Oozie & YARN resource management
Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation
Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP
Implemented Map Reduce jobs in HIVE by querying the available data
Performance tuning of Hive queries, MapReduce programs for different applications
Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
Used Cloudera Manager for installation and management of Hadoop Cluster
Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis
Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming
Integrated Kafka-Spark streaming for high efficiency throughput and reliability
Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis
Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts

Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Oracle 12c, Flume, Oozie, HBase, Impala, Spark Streaming, Yarn, Eclipse, spring, PL/SQL, UNIX Shell Scripting, Cloudera, BitBucket.

Confidential

Jr.JAVA Developer

Responsibilities:

Involved in the analysis, design, implementation, and testing of the project
Implemented the presentation layer with HTML, XHTML and JavaScript
Developed web components using JSP, Servlets, and JDBC
Designed tables and indexes
Extensively worked on JUnit for testing the application code of server-client data transferring
Developed and enhanced products in design and in alignment with business objectives
Used SVN as a repository for managing/deploying application code
Involved in the system integration and user acceptance tests successfully
Developed front end using JSTL, JSP, HTML, and JavaScript
Wrote complex SQL queries and stored procedures
Involved in fixing bugs and unit testing with test cases using Junit
Actively involved in system testing
Involved in implementing service layer using Spring IOC module
Prepared the Installation, Customer guide, and Configuration document which were delivered to the customer along with the product

Environment: Java, JSP, JSTL, HTML, JAVAScript, Servlets, JDBC, JavaScript, MySQL, JUnit, Eclipse IDE.

We provide IT Staff Augmentation Services!

Sr. Big Data/spark Developer Resume

Philadelphia, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship