We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

5.00/5 (Submit Your Rating)

San Jose, CA

SUMMARY:

  • Over 7+ years of professional IT experience with Big Data Technology including Hadoop/YARN, Pig, Hive, Hbase, Cassandra and Spark.
  • Hands on experience with Apache Spark, Spark SQL and Spark Streaming.
  • Worked with different distributions of Hadoop and Big Data technologies including Hortonworks and Cloudera.
  • Expertise in Big Data Hadoop Ecosystem like Flume, Hive, Cassandra, Sqoop, Oozie, Zookeeper, Kafka etc.
  • Well versed with Developing and Implementing MapReduce programs using Java and Python.
  • Familiarity with NoSQL databases like Hbase and Cassandra.
  • Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
  • Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
  • Experience in NoSQL database MongoDB and Cassandra.
  • Familiarity on real time streaming data with Spark and Kafka.
  • Detailed knowledge and experience of Design, Development and Testing Software solutions using Java and J2EE technologies.
  • Strong understanding of Data warehouse concepts, ETL, data modeling experience using Normalization, Business Process Analysis, Reengineering, Dimensional Data modeling, physical & logical data modeling.
  • Experience using middleware architecture using Sun Java technologies like J2EE, JSP, Servlets, and application servers like Web Sphere and Web logic.
  • Familiarity working with popular frameworks likes Struts, Hibernate, Spring MVC and AJAX.
  • Experience in Object Oriented language like Java and Core Java.
  • Experience in creating web - based applications using JSP and Servlets.
  • Experience in Database design, Entity relationships, Database analysis, Programming SQL, PL/SQL, Packages and Triggers in Oracle and SQL Server on Windows and LINUX.
  • Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
  • Research-oriented, motivated, proactive, self-starter with strong technical, analytical and interpersonal skills.
  • Experience in production support and application support by fixing bugs.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.

TECHNICAL SKILLS:

Big Data Technologies: Spark, Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Zookeeper, and Cloudera.

Scripting Languages: Python, Shell

Programming Languages: Java, Scala, C, C++

Web Technologies: HTML, J2EE, CSS, JavaScript, Servlets, JSP, XML Frameworks: Struts, Spring, Hibernate

Application Server: IBM WebSphere Server, Apache Tomcat.

DB Languages: SQL, PL/SQL

Databases /ETL: Oracle 9i/10g/11g

NoSQL Databases: Hbase, Cassandra, ElasticSearch, MongoDB.

Operating Systems: Linux, UNIX

PROFESSIONAL EXPERIENCE:

Confidential, San Jose, CA

Hadoop Consultant

Responsibilities:

  • Developed Spark SQL jobs that read data from Data Lake using Hive transform and save it in Hbase.
  • Built Java client that is responsible for receiving XML file using REST call and publishing it to Kafka.
  • Built Kafka + Spark streaming job that is responsible for reading XML file messages from Kafka and transforming it to POJO using JAXB.
  • Built Spark + Drools integration that lets us develop Drools rules as part of Spark streaming job.
  • Built Hbase DAO’s that responsible for querying data that drools needs from Hbase.
  • Built logic to publish output of Drools rules to Kafka for further processing.
  • Wrote MapReduce jobs to generate reports for the number of activities created on a particular day, during a dumped from the multiple sources and the output was written back to HDFS.
  • Worked on Oozie workflow, cron job.
  • Cluster coordination services through Zookeeper.
  • Worked with Sqoop for importing and exporting data between HDFS and RDBMS systems.
  • Designed a data warehouse using Hive. Created partitioned tables in Hive.
  • Developed the Hive UDF to pre-process the data for analysis.
  • Analyzed the data by performing Hive queries and running Pig scripts to know Artist behavior.
  • Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
  • Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a Map reduce way.
  • Exported data from DB2 to HDFS using Sqoop and NFS mount approach.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Moved data from Hadoop to Cassandra using Bulk output format class.
  • Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
  • Automated the work flow using shell scripts.
  • Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
  • Automated the work flow using shell scripts.
  • Performance tuning of the hive queries, written by other developers.

Environment: Hadoop, HDFS, Hive, Spark, Spark SQL, Spark Streaming, Kafka, Hbase, Map Reduce, Pig, Oozie, Sqoop, REST, OpenShift, Zookeeper, Cassandra, Drools.

Confidential, Walnut Creek, CA

Hadoop Consultant

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Developed Sqoop jobs for extracting data from different databases, for both initial and incremental data load
  • Developed MapReduce jobs for cleaning up the ingested data, as well as calculating computed fields.
  • Designed Hive external tables for storing data extracted using Sqoop.
  • Developed Hive jobs for moving data from Avro to ORC format, ORC format was used to speed up the queries
  • Created Hive External tables for derived data and loaded the data into tables and query data using HQL for calculating the claim fraud flags.
  • Created Python scripts for data cleaning of output from Hive &HBase queries
  • Designed Hive External tables with ElasticSearch as Storage format for storing the results of claim flag calculation
  • Implemented the workflows using Apache Oozie framework to orchestrate end to end execution.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the mapreduce jobs given by the users.
  • Exported analyzed data using Sqoop for generating reports.
  • Extensively used Pig for data cleansing. Developed Hive scripts to extract the data from the web server output files.
  • Worked on data lake concepts, converted all ETL jobs into pig/hive scripts.
  • Participated in the Oracle Golden gate POC that would be used for bringing CDC changes to Hadoop using Flume.
  • Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
  • Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.

Environment: Hadoop, Spark, MapReduce, HDFS, Flume, Sqoop, Hive, Zookeeper, Pig, Horton works, Oozie, Elastic Search, NoSQL, UNIX/LINUX.

Confidential, Houston, TX

Hadoop Consultant

Responsibilities:

  • Obtained the requirement specifications from the SME’s, Business Analysts in the BR, and SR meetings for corporate workplace project. Interacted with the Business users to build the sample report layouts.
  • Involved in writing the HLD’s along with the RTM’s tracing back to the corresponding BR’s and SR’s and reviewed them with the Business.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Wrote MapReduce programs in Java to achieve the required Output.
  • Created Hive Tables and Hive scripts to automate data management.
  • Worked on debugging, performance tuning of Hive & Pig Jobs
  • Performed cluster co-ordination through Zookeeper.
  • Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
  • Created POC to store Server Log data in MongoDB to identify System Alert Metrics.
  • Installed and configured Apache Hadoop and Hive/Pig Ecosystems.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked on debugging, performance tuning of Hive Jobs.
  • Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
  • Involved in loading data from LINUX and UNIX file system to HDFS.
  • Installed and configured Hive and wrote Hive UDFs for transforming and loading data.
  • Created Hive Tables and Hive scripts to automate data management.
  • Created HBase tables to store various data formats of PII data coming from different portfolios.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.

Environment: Hadoop, Oracle, HiveQL, Pig, Flume, MapReduce, Zookeeper, HDFS, Hbase, MongoDB, PL/SQL, Windows, Linux.

Confidential, Dallas, TX

J2EE Developer

Responsibilities:

  • Involved in Documentation and Use case design using UML modeling including development of Class diagrams, Sequence diagrams, and Use case Transaction diagrams.
  • Implemented an agile client delivery process, including automated testing, pair programming, and rapid prototyping.
  • Involved in developing EJB (Stateless Session Beans) for implementing business logic.
  • Involved in working with JMS Queues.
  • Accessed and Manipulated XML documents using XML DOM Parser.
  • Deployed the EJBs on JBoss Application Server.
  • Involved in developing Status and Error Message handling.
  • Used Web services SOAP protocol to transfer XML messages from one environment to other.
  • Hibernate is used to persist data to an Oracle 10g database.
  • Implemented various HQL queries to access the database through application work flow.
  • Involved in writing Junit Test Cases using Junit testing framework.
  • Used Log4j for External Configuration Files and debugging.
  • Involved in Unit, Integration and Performance Testing for the new enhancements.

Environment: Java, JDK, WSAD, Hibernate, Junit, EJB, JSP, Spring MVC, JMS, XML, XSLT, XML Parsers (DOM), JBoss, Web Services, HTML, JavaScript, Oracle and Windows XP.

Confidential, Columbus, OH

Java Developer

Responsibilities:

  • Involved in requirement gathering, functional and technical specifications.
  • Monitoring and fine tuning IDM performance and Enhancements in the self-registration process.
  • Developed OMSA GUI using MVC architecture, Core Java, Java Collections, JSP, JDBC, Servlets, ANT and XML within a Windows and UNIX environment.
  • Used Java Collection Classes like Array List, Vectors, Hash Map and Hash Table.
  • Used Design Patterns MVC, Singleton, Factory, Abstract Factory.
  • Wrote requirements and detailed design documents, designed architecture for data collection.
  • Developed algorithms and coded programs in Java.
  • Involved in design and implementation using Core Java, Struts, and JMS
  • Performed all types of testing includes Unit testing, Integration and testing environments.
  • Worked on a modifying an existing JMS messaging framework for increased loads and performance optimizations.

Environment: JAVA, Design Patterns, Oracle, SQL/ PL SQL, JMS.

We'd love your feedback!