We provide IT Staff Augmentation Services!

Sr. Big Data/cloud Developer Resume

3.00/5 (Submit Your Rating)

Lake Forest, CA

PROFESSIONAL SUMMARY:

  • 8+ years of professional experience this includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using Java / J2EE Technologies and Hadoop technologies.
  • Hands on experience in using various Hadoop distributions (Apache, Hortonworks, Cloudera, MapR).
  • Experience in working with Confidential EMR, Cloudera (CDH3, CDH4 & CDH5 ) and Horton Works Hadoop Distributions.
  • Expertise in Hadoop Ecosystem tools which including HDFS, Yarn, MapReduce, Pig, Hive, Sqoop, Flume, Kafka, Spark, Zookeeper and Oozie.
  • Good knowledge in EMR (Elastic Map Reducing) to perform big data operations in AWS.
  • Knowledge in working with Confidential Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Excellent understanding of Spark and its benefits in Big Data Analytics.
  • Hands on experience in Stream processing frameworks such as Storm , Spark Streaming .
  • Experience in design and develop the POC in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Hand - on experience in using Scala, Spark Streaming, batch processing for processing the Streaming data and batch data.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala .
  • Experience in data analysis using HiveQL, Pig Latin and custom Map Reduce programs in Java.
  • Hands-on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka .
  • Experienced working with Hadoop Big Data technologies (hdfs and Mapreduce programs), Hadoop ecosystems ( Hbase, Hive, pig ) and NoSQL database MongoDB.
  • Experience in queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL .
  • Applied Machine Learning and performed statistical analysis on the data.
  • Scraped and analyzed data using Machine Learning algorithms in Python and SQL.
  • Experience using Spark DataStax and Cassandra Connector load data to and from Cassandra.
  • Experience on usage of NoSQL in writing applications like HBase, Cassandra and MongoDB.
  • Extensive Experience on importing and exporting data using Flume and Kafka.
  • Experience in configuring the Zookeeper to coordinate servers in clusters and to maintain data consistency.
  • Expertise in loading the data from the different data sources like ( Teradata and DB2 ) into HDFS using Sqoop and load into partitioned Hive tables.
  • Experience in developing data pipeline by using Kafka to store the data into HDFS.
  • Experience in migrating data by using SQOOP from HDFS to Relational Database System and vice-versa according to client's requirements.
  • Used Cassandra CQL with Java API’s to retrieve the data from Cassandra tables.
  • Worked with NIFI for managing flow of data from source to HDFS.
  • Good understanding on Linux/Linux Kernel Internals and debugging.
  • Good Experience on source control repositories like CVS , GIT and SVN .
  • Experience in working different scripting technologies like Python , UNIX shell scripts.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
  • Experience working with Spring and Hibernates frameworks in JAVA.
  • Experience in developing web page interfaces using HTML, JSP and Java Swings scripting languages.
  • Used Spring Core Annotations for Dependency Injection Spring DI and Spring MVC for REST API’s and Spring Boot for micro-services.
  • Good understanding and working experience on Cloud based architectures.
  • Experience in handling various file formats like AVRO, Parquet, Sequential etc.
  • Experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
  • Expertise implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Expertise in Oracle ORMB and Stored procedures concepts.
  • Expanding monolithic application into micro-services architecture.
  • Good understanding and experience with Software Development methodologies like Agile and Waterfall and performed Testing such as Unit, Regression, White-box, Black-box.
  • Ability to work onsite and offshore team members.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, MapReduce, YARN, Pig, Hive, Sqoop, Kafka, Flume, HBase, Cassandra, MongoDB, Spark, Solr, Ambari, Hue, Avro, Mahout, Impala, Oozie, Nifi and Zookeeper

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Database: MySQL, Oracle 10g/11g, PL/SQL, MS SQL Server 2012

No-SQL Database: HBase, Cassandra and MongoDB

Programming Languages: C, C++, Java, JavaScript, Python, Scala

Frameworks: Struts, Spring, Hibernate, Spring Boot, Micro-services

Operating System: Windows 7/8/10, Vista, Ubuntu, Linux, UNIX, Mac OS

Cloud Platforms: AWS Cloud, Google Cloud

Application Servers: Web Logic, Web Sphere, Tomcat

Architecture: Client-Server Architecture, Relational DBMS, OLAP, OLTP

Testing: Selenium Web Driver, Junit

Modelling Tools: Visual paradigm for UML, Rational Rose, StarUML

ETL Tools: Talend, Informatica, Tableau

IDE Tools: NetBeans, Eclipse, Intellij, Visual Studio Code

Built Tools: Maven, Jenkins

Development Methodologies: Waterfall, Agile/Scrum

PROFESSIONAL EXPERIENCE:

Confidential, Lake Forest, CA

Sr. Big Data/Cloud Developer

Responsibilities:

  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Experience in maintaining the EC2 (Elastic Computing Cloud) and RDS (Relational Database Services) in Confidential web services.
  • Creating S3 buckets, also managing policies for S3 buckets and utilized S3 bucket and Glacier for storage and backup on AWS.
  • Strong experience in using Confidential Athena to analyze data in Confidential S3 using standard SQL .
  • Experience in moving the data from the Confidential S3 bucket to the AWS Glue Data Catalog then, we use the AWS Glue job, which influence the Apache Spark Python API (pySpark) , to transform the data from the Glue Data Catalog.
  • AWS Glue job helps us move the transformed data to Confidential Redshift data warehouse.
  • Designed the ETL process and created the high-level design document including the logical data flows, source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling.
  • Effectively migrated data from different source systems to build a secure data warehouse.
  • Built data analytics on Spark which increased the revenue of the business.
  • Involved in worked with integrate tools like Elastic Search with existing source systems.
  • Implemented big data workflows to ingest the data from various sources to Hadoop using OOZIE and these workflows comprises of heterogeneous jobs like Hive, Sqoop and Python Script.
  • Implementing project using Agile SCRUM methodology, involved in daily stand up meetings.

Environment: AWS, Elastic Search, Hadoop, HDFS, Sqoop, Kafka, Hive, Oozie, Zookeeper, Spark-Core, Spark-SQL, Spark-Streaming, Scala, Python, and Visual Studio Code.

Confidential

Sr. Hadoop/SPARK Developer esponsibilities:

  • Involve in design and development phases of software development life cycle(SDLC) using Scrum methodology.
  • Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Worked with ELASTIC MapReduce( EMR ) and setup Hadoop environment in AWS EC2 Instances.
  • Storing data in Confidential S3 buckets using objects and created data pipeline by integrating Kafka Spark streaming to data repository (S3 buckets).
  • Worked with cloud services like Confidential Web Services(AWS) and involving in ETL, Data integration and Migration.
  • Responsible for developing data pipeline with Confidential AWS to extract the data from weblogs and store in HDFS.
  • Worked with NIFI for managing flow of data from sources through automated data flow.
  • Strong experience working with Confidential AWS EC2 for accessing Hadoop cluster components.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Experience with Spark Context, Spark SQL, Data frames, RDD’S and YARN.
  • Experience in using Spark Streaming API’s for performing transformations and actions on fly for building common learner data model which gets data from Kafka in near real-time and persist it to Cassandra.
  • Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
  • Experience in Query data using Spark SQL on the top of Spark Engine implementing Spark RDD’s in Scala.
  • Experience in implementing Kafka Java producers and create custom partitions, configured brokers and implemented High level consumers to implement data platform.
  • Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for Data aggregation, queries and writing data back into OLTP system through Sqoop.
  • Used IntelliJ IDE for developing Scala scripts for Spark jobs.
  • Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat file.
  • Ingested Streaming data with Apache NIFI into Kafka .
  • Expertise in writing Spark RDD transformations, Actions, Data Frames, Case classes for the required input data and performed the data transformations using Spark - Core .
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Experience in performing advanced procedures like text analytics using in-memory computing capabilities of Spark using Scala.
  • Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
  • Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for Data analysis and engineering type of roles.
  • Creating Hive tables as per requirement were Internal (or) External tables are defined with appropriate static/dynamic partitions and bucketing intended for efficiency.
  • Experience in data modeling and connecting Cassandra from Spark and saving summarized data frame to Cassandra.
  • Experienced in working with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
  • Developed and deployed Apache NIFI flows across various environments, optimized Nifi data flows.
  • Analyze large datasets to find patterns and insights within structured and unstructured data to help business with the help of Tableau.
  • Experience in collecting log data from web servers and pushed to HDFS using Flume from NoSQL DB's Cassandra .
  • Proficient in NIFI and workflow scheduler managing Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experience with CDH5 distribution and Cloudera Manager to manage and monitor Hadoop clusters.
  • Experience in manage and reviewing Hadoop log files.
  • Used Kerberos and integrated it to Hadoop cluster to make it more strong and secure from unauthorized access.
  • Experience using Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
  • Implemented the project by using Agile Methodology and attended Scrum Meetings daily.

Environment: AWS, Hadoop, Cloudera, YARN, HDFS, Sqoop, Cassandra, Spark-Core, Spark-SQL, Spark-Streaming, java, Scala, Python, Apache Flume, Kafka, Hive, Kerberos, Tableau, Nifi, Zookeeper and Intellij.

Confidential, Stamford, CT

Hadoop/Spark Developer

Responsibilities:

  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS .
  • Experience in creating batch and real-time pipelines using Spark as the main processing framework.
  • Worked on the large-scale Hadoop Yarn cluster for distributed data processing and analysis using Spark, Hive, and HBase .
  • Migrated an existing on-premises application to AWS and used AWS services like EC2 and S3 for small data sets processing and storage, experienced in maintaining the Hadoop cluster on AWS EMR .
  • Collected JSON data from HTTP source and developed Spark API’s that helps to do inserts and updates in Hive tables.
  • Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop .
  • Used Confidential cloud-watch to monitor and track resources on AWS.
  • Worked on migrating MapReduce programs into Spark transformations using Spark with Scala.
  • Experience in working with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala.
  • Implemented spark sample programs in python using pyspark.
  • Experience in designing the reporting application that uses the Spark SQL to fetch and generate reports on HBase.
  • Extensively used Spark SQL, PySpark API's for querying and transformation of data residing in Hive .
  • Responsible for developing the data pipeline using Sqoop, Flume and Pig to extract data from weblogs and store in HDFS.
  • Experience in loading D-Stream data into Spark RDD and did in-memory data computation to generate output response.
  • Experience in handling continuous streaming data which comes from different sources using Flume and set the destination as HDFS.
  • Experience in loading Data into HBase using Bulk Load and Non-bulk load.
  • Experience in working on designing and developing ETL workflows using Java for processing data in HDFS/HBase using Oozie.
  • Experience in using JIRA for bug tracking, CVS for version control.
  • Hands on experience on loading the Created HFiles into HBase for faster access of large customer base without taking performance hit.
  • Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Involve in using OOZIE operational services for batch processing and scheduling workflows dynamically.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: AWS (EMR, EC2, S3), Cloudera, MapReduce, Pig, Hive, Sqoop, Flume, Pyspark, Spark, Scala, Java, HBase, Apache Avro, Oozie, Zookeeper, Elastic Search, Kafka, Python, JIRA, CVS and Eclipse.

Confidential, BOSTON, MA

Java/Hadoop Developer

Responsibilities:

  • Handled large amount of data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Wrote MapReduce jobs using Java API and Pig Latin.
  • Experience in working with Hadoop clusters using Hortonworks distributions.
  • Launching and Setup of HADOOP Cluster which includes configuring different components of HADOOP.
  • Extracted and restructured the data into MongoDB using import and export command line utility tool.
  • Developed MapReduce programs to clean and aggregate data.
  • Involve in creating Hive tables and loading &analyzing data by using Hive queries.
  • Scheduled jobs using Oozie workflow Engine.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Writing UDF (User Defined Functions) in Pig, Hive when needed.
  • Hands on experience in J2EE components on Eclipse IDE .
  • Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD.
  • Involved in loading data from UNIX file system and FTP to HDFS
  • Handled Avro Data files using Avro Tools and Map Reduce.
  • Developed Custom Loaders and Storage Classes in PIG to work with various data formats like JSON, XML, CSV etc.
  • Implemented data serialization using Apache Avro.

Environment: Hortonworks, Apache Hadoop 1.0.1, HDFS, MapReduce, Java, Talend, Pig, Hive, Sqoop, J2EE, Flume, MongoDB, MYSQL, Apache Avro, Python, Avro, UNIX, Shell scripts and Eclipse.

Confidential

Hadoop/Java Developer

Responsibilities:

  • Monitoring the health of MapReduce programs which are running on cluster.
  • Good knowledge about MapReduce Framework includes MR daemons, sorting and shuffle phase and task.
  • Experience with Cloudera Manager for management of Hadoop cluster.
  • Hands on experience in exporting the results into relational databases using Sqoop for visualization and to generate reports for BI team.
  • Custom talend jobs to ingest, enrich and distribute data in Cloudera Hadoop ecosystem.
  • Involve in creating Map Reduce programs for some refined queries on big data.
  • Implementing business logic by writing Pig and Hive UDF’s for some aggregative operations and to get the results from them.
  • Creating the cube in talend to create different types of aggregation in the data and also to visualize them.
  • Developed Flume Agents for loading and filtering Streaming data into HDFS.
  • Involve in using HCATALOG to access Hive table metadata from MapReduce or Pig Latin.
  • Developed simple to complex MapReduce jobs using Hive and Pig.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources. Created HBase tables and used HBase sinks and loaded data into them to perform analytics using Tableau.
  • Developed Sqoop jobs to perform incremental imports into Hive tables.
  • Moving bulk amount of data into HBase using MapReduce integration.
  • Worked with Oozie workflow engine to run multiple Hive jobs.

Environment: Cloudera, HDFS, MapReduce, Java, Pig, Hive, Sqoop, Tableau, Flume, Talend, HBase, MYSQL, Java, Apache Avro, Python, Oozie and Eclipse.

Confidential

Java Backend Developer

Responsibilities:

  • Worked in complete SDLC phases like Requirements, Specification, Design, Implementation and Testing.
  • Designed and developed a system framework using J2EE technologies based on MVC architecture.
  • Used JUnit for testing UI frameworks.
  • Involved working on developing profile view web pages add, edit using HTML, CSS, JQuery, JavaScript, AJAX, DHTML, JSP custom tags also front-end development.
  • Optimized XML parsers like SAX and DOM for the production data.
  • Developed the application by using MAVEN script.
  • Experience on using Log4j for debugging.
  • Client-side Validations are done using JavaScript.
  • Expertise in implementing Struts MVC framework for developing J2EE web application.
  • Developed Spring and Hibernate data layer components for application.
  • Worked on SVN version controlling.
  • Implemented session beans using EJB 2.0.
  • Implemented validations using JavaScript for the fields on Login screen and registration page.
  • Experience in developing web-based applications using Google Wen Toolkit(GWT) and J2EE servlet technology.
  • Having good knowledge of JDBC connectivity.
  • Developed the DAO layer for the application using Hibernate and JDBC.
  • Designed and developed the application using Agile methodology and followed SCRUM.

Environment: HTML, CSS, JQuery, JavaScript, Angular JS, Java/J2EE, JDBC, Struts, Spring, Hibernate, Junit, SVN, Maven, Ajax, Apache CFX, Jenkins, Log4j, EJB, Agile, Scrum and Web Service.

Confidential

Java/J2EE Developer

Responsibilities:

  • Experience in responsible for programming and troubleshooting web applications using HTML, CSS, JavaScript, Java, JSP, SQL server.
  • Develop and maintain application UI based on Eclipse.
  • Developed client-side validations using JavaScript and JQuery.
  • Involved in Database design and developing SQL Queries, stored procedures on MySQL
  • Experience on Spring integration for communicating with business components and worked on spring with Hibernate integration for ORM mappings.
  • Deployed the application on WebSphere application server.
  • Experience in Struts Action Servlet is used as Front Controller for redirecting the control to the specific J2EE component as per the requirement.
  • Used MAVEN for project management and build automation and Continuous Integration is done using Jenkins.
  • Involve in deploy the application using Tomcat webserver.
  • Helping UI team to integrate using Spring and RESTFUL services.
  • Responsible for performing code reviewing and debugging.

Environment: HTML, CSS, JavaScript, JQuery, Java/J2EE, JSP, XML, MYSQL, Tomcat server, WebSphere, Spring, Hibernate and UNIX/WINDOWS.

We'd love your feedback!