We provide IT Staff Augmentation Services!

Big Data/hadoop Developer Resume

2.00/5 (Submit Your Rating)

New, JerseY

SUMMARY

  • Around 7 years of experience in Design, Analysis and Development of software application using Big Data/ Hadoop, Spark and Java/JEE Technologies.
  • Knowledge in Spark Core, Spark - SQL, Spark Streaming and machine learning using Scala and Python Programming languages.
  • Worked on Open Source Apache Hadoop, Cloudera Enterprise (CDH) and Hortonworks Data Platform (HDP)
  • Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, HBase, Zookeeper, Task Tracker, Name Node, Data Node, Sqoop, Oozie, Cassandra, Flume and Avro.
  • Developed various Map Reduce applications to perform ETL workloads on terabytes of data
  • Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HQL queries.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Good understanding of RDD operations in Apache Spark like Transformations &Actions, Persistence/ Caching, Accumulators, Broadcast Variables, Optimising Broadcasts.
  • Hands on experience in performing aggregations on data using Hive Query Language (HQL).
  • Developed MapReduce programs in java.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
  • Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloud watch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshift, RDS, Aethna, Zeppelin & Airflow.
  • Strong knowledge on creating and monitoring Hadoop clusters on Amazon EC2, VM, Hortonworks Data Platform 2.1 & 2.2, CDH3, CDH4Cloudera Manager and Azure HDINSIGHT Distributions on Linux, Ubuntu OS etc.
  • Good experience in extending the core functionality of Hive and Pig by developing user-defined functions to provide custom capabilities to these languages.
  • Expertise in Hadoop Ecosystem components HDFS, Map Reduce, Hive, Pig, Sqoop, Hbase and Flume for Data Analytics.
  • Have a hands-on experience on fetching the live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
  • Capable of processing large sets of structured, semi-structured and unstructured data sets.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Expertise in writing Map-Reduce Jobs in Java for processing large sets of structured, semi-structured and unstructured data sets and store them in HDFS.
  • Experience in developing Custom UDFs for datasets in Pig and Hive.
  • Analyse latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Designed and Developed Shell Scripts and Sqoop Scripts to migrate data in and out of HDFS
  • Designed and Developed Oozie workflows to execute MapReduce jobs, Hive scripts, shell scripts and sending email notifications
  • Worked on pipeline and partitioning parallelism techniques and ensured load balancing of data
  • Deployed different partitioning methods like Hash by field, Round Robin, Entire, Modulus, and Range for bulk data loading
  • Hands on experience in working with input file formats like parquet, json, Avro.
  • Worked on Extraction, Transformation, and Loading (ETL) of data from multiple sources like Flat files, XML files and Databases.
  • Hands-on experience in J2EE technologies such as Servlets, JSP, EJB, JDBC and developing Web Services providers and consumers using SOAP, REST.
  • Used Agile Development Methodology and Scrum for the development process
  • Good Knowledge in HTML, CSS, JavaScript and web-based applications.
  • Worked extensively in design and development of business process using SQOOP, PIG, HIVE, HBASE

TECHNICAL SKILLS

Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, HBase, Flume, Pig, Sqoop, Map Reduce, Cloudera.

Big data distribution: Cloudera, Amazon EMR, related tools and systems

Programming languages: Core Java, Scala, SQL, Shell Scripting

Operating Systems: Windows, Linux (Ubuntu)

Databases: Oracle, SQL Server, data stores such as DynamoDB, Cassandra

Designing Tools: Eclipse

Java Technologies: JSP, Servlets, Junit, Spring, Hibernate

Web Technologies: XML, HTML, JavaScript, JVM, JQuery, JSON

Linux Experience: System Administration Tools, Puppet, Apache

Web Services: Web Service (RESTfuland SOAP)

Frame Works: Jakarta Struts 1.x, Spring 2.x

Development methodologies: Agile, Waterfall

Logging Tools: Log4j

Application / Web Servers: Apache Tomcat, WebSphere, Weblogic

Messaging Services: Kafka

Version Tools: Git, SVN and CVS

Analytics: Tableau

PROFESSIONAL EXPERIENCE

Confidential, New Jersey

Big Data/Hadoop developer

Responsibilities:

  • Worked on analyzingHadoop clusterand different big data analytic tools includingHBasedatabase andSqoop
  • Design, plan, and develop programs to perform automated extract, transform and load data between data sources when working with large data sets (TBs+ range)
  • Create and execute unit tests and perform basic application testing
  • Strong Knowledge of Hadoop and Hive and Hive's analytical functions.
  • Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different databases (i.eTeradata) to Hadoop.
  • Work with the application team to design and develop an effective Hadoop solution. Be actively engaged and responsible in the development process
  • The data volume is around 750 Gb and is an apt case for Hadoop like computation. Using loading utilities like Sqoop, data is loaded onto clusters and cleaned.
  • Written Hive UDFS to extract data from staging tables.
  • Written Linux Shell Scripts to automate Sqoop commands and Oozie workflows to Import multiple tables at onces to Hive.
  • Written efficient Oozie workflows, sub-workflows and coordinators for data importing and exporting
  • Experience with the tools in Hadoop Ecosystem including Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper
  • Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Developed the Map Reduce programs to parse the raw data and store the pre-Aggregated data in the partitioned tables.
  • Production support monitor jobs on daily basis and fix them and see to jobs were successful.
  • Experience in managing and reviewingHadoop log files.

Environment: MapR, Cloudera, Hadoop, HDFS, Hive, SparkSql, MapReduce, Sqoop, Oozie, Zepplin, Kafka, Spark, Scala, Hbase, ZooKeeper, Shell Scripting, Crontab.

Confidential, New Jersey

Data Engineer/Spark/Scala developer

Responsibilities:

  • Used Sqoop to getdata from various databases included Oracle, SQL Server, DB2 and ingested into HDFS.
  • Experience in providing support to dataanalyst in running Hive queries.
  • Played a Significant role in the development of Confidential Data Lake and in building Confidential Data Cube on Microsoft AzureHDINSIGHTcluster
  • Wrote AZUREPOWERSHELLscripts to copy or move data from local file system to HDFS Blob storage.
  • Developed multiple Map Reduce programs using Java and Pig to process large volumes of semi/unstructured datafiles using different Map Reduce design patterns.
  • Worked on creating Hive tables and written Hive queries for dataanalysis to meet business requirements and experience using Autosys jobs to import the data between RDBMS and HDFS.
  • Experience with writing and running Unix shell scripts.
  • Worked on cloud computing infrastructure (e.g. Amazon Web Services EC2) and considerations for scalable, distributed systems
  • Worked on Go-cd (ci/cd tool) to deploy application and have experience with Munin frame work for BigData Testing.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala with Cloudera distribution
  • Designed Cluster co-ordination services through Zookeeper.
  • Resolving the technical issues and perform comprehensive reviews at various stages of the project
  • Fetch Data to/from Mainframe DB2, VSAM files, MS-SQL Server, Azure DataLake & BLOB using Sqoop and create the file and store into HDFS.
  • Designed Hive tables according to source metadata and push datainto from file into external Hive-tables and then connect to Azurefor further PySpark process.
  • Schedule daily migration Hive table jobs through Airflow to load datafrom PROD to DR servers.
  • Load and retrieve any file type of datai.e text, CSV, XLS, ORC, Parquet into hive or from hive to analyze and summarize the datafor further analysis.
  • Writing DistCP shell scripts to manage, load & Migrate dataacross PROD & DR servers.

Confidential, Dallas, TX

Spark/Scala developer

Responsibilities:

  • Developed various Map Reduce applications to perform ETL workloads on terabytes of data
  • Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HQL queries.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
  • Experience in working with flume to load the log data from multiple sources directly into HDFS.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Developed Spark core and Spark-SQL scripts using Scala for faster data processing.
  • Involved in code review and bug fixing for improving the performance.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Developing design documents considering all possible approaches and identifying best of them.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data
  • Optimized Hive queries for performance tuning.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Worked on importing data from various sources and performed transformations using MapReduce, hive to load data into HDFS.
  • Worked on compression mechanisms to optimize MapReduce Jobs.
  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Created scripts to automate the process of Data Ingestion.

Confidential, Charlotte, NC

Hadoop developer/ Spark developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Good understanding and related experience with Hadoop stack - internals, Hive, Pig and Map/Reduce
  • The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Implement AWS Data Lake leveraging S3, terraform, EC2 and Lambda in performing data processing and storage while writing complex SQL queries, analytical and aggregate functions on views in Snowflake data warehouse to develop near real time visualization usingTableau Desktop and Alteryx.
  • Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users
  • Responsible for developingdatapipeline with Amazon AWS to extract thedatafrom weblogs and store in HDFS and imported the data from different sources like AWS S3, Local file system into Spark RDD
  • Involved in managing and reviewing Hadoop log files
  • Load and transform large sets of structured, semi structured and unstructured data
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Importing and exporting data into HDFS and HIVE using Sqoop.
  • Involved in gathering the requirements, designing, development and testing.
  • Developed PIG scripts for source data validation and transformation.
  • Designing and developing tables in HBase and storing aggregating data from Hive.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Developed Spark core and Spark-SQL scripts using Scala for faster data processing.
  • Involved in code review and bug fixing for improving the performance.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data
  • Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Real time streaming the data using Spark with Kafka data stores such as DynamoDB, Cassandra.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala and related tools and systems
  • Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Strongly recommended to bring in Elastic Search and was responsible for installing, configuring and administration.
  • Developing and maintaining efficient ETL Talend jobs for Data Ingest.
  • Worked on Talend RTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
  • Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
  • Involved in migration Hadoop jobs into higher environments like SIT, UAT and Prod.

Environment: Cloudera, HDFS, Hive, Scala, Map Reduce, Storm, Java, HBase, Pig, Sqoop, Shell Scripts, Oozie, Coordinator, MySQL, Tableau, Elastic search, Talend and SFTP. Spark RDD, Kafka, Python, Horton works, Intellij, Azkaban, Ambari/Hue, Jenkins, Apache NiFi.

Confidential, Bentonville, Arkansas

Hadoop/Spark Developer

Responsibilities:

  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
  • Integrated the hive warehouse with HBase
  • Migrating the needed data from MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
  • Load the data into HBase tables for UI web application.
  • Written customized Hive UDFs in Java where the functionality is too complex.
  • Maintain System integrity of all sub-components related to Hadoop.
  • Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • HiveQL scripts to create, load, and query tables in a Hive.
  • Supported Map Reduce Programs those are running on the cluster
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Developed Apache spark jobs using Scala in test environment for faster data processing and used spark SQL for querying.
  • Developed scala code using specific monad pattern for different calculations based on the requirement.
  • Developed and executed shell scripts to automate the jobs
  • Written complex hive queries and automated using Azkaban for analyzing hourly calculations.
  • Analyzed large amounts of data sets using Pig scripts and Hive scripts.
  • Worked with Hue manager for developing hive queries and checking data in both development and production environments.
  • Developed Pig Latin scripts for extracting data.
  • Used Pig for data loading, filtering and storing the data.
  • Worked on Data Integration from different source systems.
  • Used Robo Mongo for storing the data.

Environment: Robo Mongo, Pig Latin, Hive, Azkaban, Shell, Scala, Spark SQL, Scala, Apache Spark, Map reduce, Hadoop, UDFs, HBase, HDFS, My SQL, Sqoop, MR.

Confidential

Java Developer

Responsibilities:

  • Full life cycle experience including requirements analysis, high level design, detailed design, UMLs, data model design, coding, testing and creation of functional and technical design documentation.
  • Used Spring Framework for MVC architecture with Hibernate to implement DAO code and also used Web Services to interact other modules and integration testing.
  • Developed and implemented GUI functionality using JSP, JSTL, Tiles and AJAX.
  • Designed database and involved in developing SQL Scripts.
  • Used SQL navigator as a and involved in testing the application.
  • Implementing the Design Patterns like MVC-2, Front Controller, Composite view and all Struts framework design patterns to improve the performance.
  • Designing the Use Case Diagrams, Class Model, Sequence diagrams, for SDLC process of the application.
  • Implemented GUI pages by using JavaScript, HTML, JSP, CSS, and AJAX.
  • Designed and developed UI components using JSP, JMS, JSTL.
  • Deployed project on Web Sphere application server in Linux environment.
  • Implemented the online application by using Web Services (SOAP), JSP, Servlets, JDBC, and Core Java.
  • Implemented Singleton, DAO Design Patterns, factory design pattern based on the application requirements.
  • Used DOM and SAX parsers to parse the raw XML documents.
  • Tested the web services with SOAP UI tool.
  • Developed back end interfaces using PL/SQL packages, stored procedures, Functions, Procedure, Anonymous PL/SQL programs, Cursor management, Exception Handling in PL/SQL programs.

Environment: PL/SQL, SOAP UI, DOM, SAX, Prasers, XML, Core Java, JDBC, JSP, Servlets, Linux, JSTL, JMS, HTML, CSS, AJAX, SDLC, MVC-2, Struts, SQL, Tiles, MVC, DAO, UMLs.

Confidential

Java Developer

Responsibilities:

  • Have good knowledge on database structuring and management using MYSQL.
  • Have strong knowledge of E-Commerce modules like Business to Customer, Business to
  • Business and Customer to Customer transaction.
  • Have good knowledge of data base query and transaction and database join.
  • Having good working knowledge of SAP Advance Business Application programming.
  • Certification & Training Details
  • Implemented the User Login logic using Spring MVC framework encouraging application architectures based on the Model View Controller design paradigm
  • Used various Java, J2EE APIs including JDBC, XML, Servlets, and JSP.
  • Generated Hibernate Mapping files and created the data model using mapping files
  • Developed UI using JavaScript, JSP, HTML and CSS for interactive cross browser functionality and complex user interface
  • Developed action classes and form beans and configured the struts-config.xml
  • Provided client-side validations using Struts Validator framework and JavaScript
  • Created business logic using servlets and session beans and deployed them on Apache Tomcat server
  • Created complex SQL Queries, PL/SQL Stored procedures and functions for back end
  • Prepared the functional, design and test case specifications
  • Performed unit testing, system testing and integration testing
  • Developed and implemented GUI functionality using JSP, JSTL, Tiles and AJAX.
  • Designed database and involved in developing SQL Scripts and related tools and systems.
  • Used SQL navigator as a tool to interact with DB Oracle 10g.
  • Developed portal screens using JSP, Servlets, and Struts framework
  • Involved in writing Test plans and conducted Unit Tests using JUnit.

Environment: JSP, Servlets, Struts, DB Oracle, SQL, GUI, JSP, AJAX, PL/SQL, Java script, Beans, HTML, CSS, Mapping files, Java, J2EEE, APIs, JDBC, XML, MVC, SAP.

We'd love your feedback!