We provide IT Staff Augmentation Services!

 big Data Consultant Resume

3.00/5 (Submit Your Rating)

Bentonville, ArkansaS

SUMMARY:

  • Hands on experience in developing and deploying enterprise based applications using major components in Hadoop ecosystem like Hadoop Map Reduce, YARN, Hive, Pig, HBase, Sqoop, Spark (Streaming, Spark SQL), Kafka, Oozie and ZooKeeper.
  • Working experience with small scale Hadoop environments build and support including design, configure, install, tune, and monitor Hortonworks, Cloudera clusters.
  • Hands on experience with importing and exporting data from Relational data bases to HDFS, Hive and HBase using Sqoop and Flume.
  • Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
  • Strong knowledge of Spark for handling large data processing in streaming process along with Scala/Python.
  • Hands on experience in loading the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Extracted the data from Teradata into HDFS/Databases/Dashboards using Spark streaming.
  • Involved in migrating the map reduce jobs into Spark Jobs and Used Spark SQL and Dataframes API to load structured and semi structured data into Spark Clusters.
  • Expertise in Core Java with understanding OOPS concepts like Collections, Multithreading, Polymorphism, Inheritance, Exception handling Streams, Data Structures and File I/O.
  • Knowledge with implementing Service Oriented Architecture (SOA) using SOAP and REST Web Services
  • Experienced in deploying applications on Apache Tomcat, Web Sphere and Web Logic
  • Hands on experience in using GIT Hub repository.
  • Used coding standards documentation from Sun Microsystems coding standard reference.
  • Good knowledge in Apache Maven.
  • Worked on different Web Applications Servers like Web Sphere, Apache Tomcat.
  • Worked closely with Production Support for Troubleshooting issues, and mitigating issues.

TECHNICAL SKILLS:

Programming Languages:  Java, Python, Scala

Big Data Eco Systems:  HDFS, Map Reduce, Sqoop, Spark, Zookeeper, Oozie, Hive, Kafka

Web Technologies:  HTML, JavaScript, CSS, JSON, AJAX, XML, Maven

J2EE Technologies:  Spring, Hibernate, JSP, EJB, REST and SOAP services

App/Web servers:  Apache Tomcat 7.0, Oracle Web logic

Databases: My SQL, Cassandra, Elastic Search, HBase, SQL server 2014IDE Eclipse, Net Beans, IntelliJ

Operating systems:  Windows, Linux, Mac.

PROFESSIONAL EXPERIENCE:

Confidential, Bentonville, Arkansas

Big Data Consultant

Responsibilities:
  • Involved in analysis, specification, design and implementation and testing phases of Software Development Life Cycle (SDLC) and participated in stand-up, iteration review, kick-off and retrospective meetings as a part of Agile.

  • Developed data pipeline using Eco Systems like Flume, Sqoop, Hive, Pig and Map Reduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as map-reduce, Hive, and Sqoop.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Import millions of structured data from relational database using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Created multiple Hive tables, implemented partitioning, dynamic partitioning and buckets in Hive for efficient data access.
  • Performed various analysis like path optimization on structured and unstructured data stored in HDFS, using HIVE and Datameer, to drive analytic solutions to solve a variety of business problems such as product recommendations, search engine optimization.
  • Created aggregated view of data by integrating Apache Spark and Cassandra, indexing on near real-time analytics and aggregation queries.
  • Implementing platform to collect data, data analysis and data visualization using different python libraries to compare the results.
  • Used pyspark to generate visualizations to see trends of data overtime in hive tables.
  • Extensive experience in Spark Streaming through core Spark API running Scala, Java & Python Scripts to transform raw data from several data sources into forming baseline data.
  • Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Created Kafka based messaging system to create events and alters for different systems.
  • Used Spark API over Hadoop YARN to perform analytics on data.
  • Developed Kafka and Spark to filter streamed data and push filtered data to Spark core for analysis.
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
  • Implemented various checkpoints on RDD's to disk to handle job failures and debugging.
  • Developed Spark SQL to load tables into HDFS to run select queries on top.
  • Worked on file formats like Sequence files, RC File, AVRO.

Environment: Cloudera CDH 5, Spark, Cassandra, ZooKeeper, HDFS, Jenkins, Spark, Hive, Sqoop, Eclipse, Oracle, MySQL.

Confidential

Big Data Developer

Responsibilities:

  • Worked in AWS environment for development and deployment of custom Hadoop applications to analyze collected data and this included setting up Hadoop Clusters on Amazon EC2 with Hortonworks-Ambari across 40 nodes.

  • Tested and debugged in Cloudera's pseudo-clustered Hadoop system and deployed the application to the real distributed clusters.
  • Worked on designing and deployment of Hadoop cluster using different Big Data analytic tools including Hive, HBase, Oozie, Zookeeper, Sqoop, flume, Spark, Cassandra with Horton work Distribution.
  • Collected data from websites, transactions and customer reward program to create customized marketing messages and shopping experience for each customer based on their preferences.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Implemented Hadoop Security using Kerberos Active Directory.
  • Wrote and used complex data types in Hive for storing and retrieving data using HQL in Hive and developed Hive queries to analyze reducer output data and analyzed these Hive tables and HQL Queries for generating reports.
  • Used Sqoop to transfer data from Edge Node to HDFS / S3 Data Lake by configuring S3.
  • Used Cassandra to store the analyzed and processed data for scalability
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HDFS and to run multiple Hive and Pig jobs.
  • Imported and processed the Structured, Semi-Structured, Unstructured data using Map Reduce, Hive, and Pig.
  • Developing Hive UDFs for processing the business logic implementation.
  • Used Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
  • Streamed data from different data sources to S3 using Flume agents
  • Configured the source, sink, and channel in flume configuration file to collect streaming data.
  • Created partitioned tables and loaded data using both static partition and dynamic partition methods.

Environment: Hortonworks HDP 2.3, Spark, Elastic Search, ZooKeeper, HDFS, Jenkins, Hive, Sqoop, Eclipse, Oracle, MySQL.

Confidential

Application Developer

Responsibilities:

  • Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.

  • Responsible for development of new features and maintenance activities in the web application.
  • Prepared analysis and design documents for the project for Coding, Unit Testing, Integration testing and System Testing of batch and online modules.
  • Created a key-word based testing API that simplified the process of writing automated test cases that anybody, technical or non-technical, can use.
  • Developed Java API to collect metrics and configuration data from the underlying host server using java.
  • Developed Data access bean and developed EJBs that are used to access data from the database.
  • Configured Jenkins to integrate with SVN and Maven to build and deploy test, stage and prod builds.
  • Wrote DAO classes and CRUD related Operations using spring JDBC Template.
  • Implemented a series of Java interfaces and abstract classes using dependency injection or inversion of control (IOC).
  • Used Annotations based configuration in spring to inject all the required dependencies.
  • Implemented logging and transaction aspects using Spring AOP.
  • Created RESTful web services interface to Java-based runtime engine.
  • Used Spring IOC container for managing all the beans effectively.
  • Used EJBs to handle the requests from the JSP
  • Used Log4j to capture the logs that include runtime exceptions and better to resolve issues.
  • Involved in creating hibernate configuration files for session factory and transaction manager.
  • Configured the Hibernate mapping and configuration files for different POJO’s, which are to be used in the Database for persistence.
  • Created hibernate mapping files for java classes using table per class hierarchy strategy.
  • Monitored the error logs using Log4j and fixed the problems and even implemented email alerts when some serious errors occur like problems with database connections or no mapping objects.
  • Using unique kind of support system calculated monthly, quarterly and yearly comparison of metrics related to catalog, inventory, ratings and revenue.
  • Used Maven for the Project management and to compile, run, test, deploy and to add external dependencies.
  • Used Eclipse as IDE tool for creating Servlets, JSP, and XML.
  • Developed an application with iterative methodology, recapturing the existing code and writing a new code for most of the modules

Environment: Java EE, Servlets, JSP, CSS, JS, Eclipse, REST services, Spring, Hibernate, AngularJS, HTML 5, Eclipse, Jenkins.

We'd love your feedback!