Hadoop Consultant Resume San Jose, CA - Hire IT People

SUMMARY:

Over 7+ years of professional IT experience with Big Data Technology including Hadoop/YARN, Pig, Hive, Hbase, Cassandra and Spark.
Hands on experience with Apache Spark, Spark SQL and Spark Streaming.
Worked with different distributions of Hadoop and Big Data technologies including Hortonworks and Cloudera.
Expertise in Big Data Hadoop Ecosystem like Flume, Hive, Cassandra, Sqoop, Oozie, Zookeeper, Kafka etc.
Well versed with Developing and Implementing MapReduce programs using Java and Python.
Familiarity with NoSQL databases like Hbase and Cassandra.
Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
Experience in NoSQL database MongoDB and Cassandra.
Familiarity on real time streaming data with Spark and Kafka.
Detailed knowledge and experience of Design, Development and Testing Software solutions using Java and J2EE technologies.
Strong understanding of Data warehouse concepts, ETL, data modeling experience using Normalization, Business Process Analysis, Reengineering, Dimensional Data modeling, physical & logical data modeling.
Experience using middleware architecture using Sun Java technologies like J2EE, JSP, Servlets, and application servers like Web Sphere and Web logic.
Familiarity working with popular frameworks likes Struts, Hibernate, Spring MVC and AJAX.
Experience in Object Oriented language like Java and Core Java.
Experience in creating web - based applications using JSP and Servlets.
Experience in Database design, Entity relationships, Database analysis, Programming SQL, PL/SQL, Packages and Triggers in Oracle and SQL Server on Windows and LINUX.
Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
Research-oriented, motivated, proactive, self-starter with strong technical, analytical and interpersonal skills.
Experience in production support and application support by fixing bugs.
Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.

TECHNICAL SKILLS:

Big Data Technologies: Spark, Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Zookeeper, and Cloudera.

Scripting Languages: Python, Shell

Programming Languages: Java, Scala, C, C++

Web Technologies: HTML, J2EE, CSS, JavaScript, Servlets, JSP, XML Frameworks: Struts, Spring, Hibernate

Application Server: IBM WebSphere Server, Apache Tomcat.

DB Languages: SQL, PL/SQL

Databases /ETL: Oracle 9i/10g/11g

NoSQL Databases: Hbase, Cassandra, ElasticSearch, MongoDB.

Operating Systems: Linux, UNIX

PROFESSIONAL EXPERIENCE:

Confidential, San Jose, CA

Hadoop Consultant

Responsibilities:

Developed Spark SQL jobs that read data from Data Lake using Hive transform and save it in Hbase.
Built Java client that is responsible for receiving XML file using REST call and publishing it to Kafka.
Built Kafka + Spark streaming job that is responsible for reading XML file messages from Kafka and transforming it to POJO using JAXB.
Built Spark + Drools integration that lets us develop Drools rules as part of Spark streaming job.
Built Hbase DAO’s that responsible for querying data that drools needs from Hbase.
Built logic to publish output of Drools rules to Kafka for further processing.
Wrote MapReduce jobs to generate reports for the number of activities created on a particular day, during a dumped from the multiple sources and the output was written back to HDFS.
Worked on Oozie workflow, cron job.
Cluster coordination services through Zookeeper.
Worked with Sqoop for importing and exporting data between HDFS and RDBMS systems.
Designed a data warehouse using Hive. Created partitioned tables in Hive.
Developed the Hive UDF to pre-process the data for analysis.
Analyzed the data by performing Hive queries and running Pig scripts to know Artist behavior.
Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a Map reduce way.
Exported data from DB2 to HDFS using Sqoop and NFS mount approach.
Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
Moved data from Hadoop to Cassandra using Bulk output format class.
Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
Automated the work flow using shell scripts.
Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
Automated the work flow using shell scripts.
Performance tuning of the hive queries, written by other developers.

Environment: Hadoop, HDFS, Hive, Spark, Spark SQL, Spark Streaming, Kafka, Hbase, Map Reduce, Pig, Oozie, Sqoop, REST, OpenShift, Zookeeper, Cassandra, Drools.

Confidential, Walnut Creek, CA

Hadoop Consultant

Responsibilities:

Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
Developed Sqoop jobs for extracting data from different databases, for both initial and incremental data load
Developed MapReduce jobs for cleaning up the ingested data, as well as calculating computed fields.
Designed Hive external tables for storing data extracted using Sqoop.
Developed Hive jobs for moving data from Avro to ORC format, ORC format was used to speed up the queries
Created Hive External tables for derived data and loaded the data into tables and query data using HQL for calculating the claim fraud flags.
Created Python scripts for data cleaning of output from Hive &HBase queries
Designed Hive External tables with ElasticSearch as Storage format for storing the results of claim flag calculation
Implemented the workflows using Apache Oozie framework to orchestrate end to end execution.
Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the mapreduce jobs given by the users.
Exported analyzed data using Sqoop for generating reports.
Extensively used Pig for data cleansing. Developed Hive scripts to extract the data from the web server output files.
Worked on data lake concepts, converted all ETL jobs into pig/hive scripts.
Participated in the Oracle Golden gate POC that would be used for bringing CDC changes to Hadoop using Flume.
Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
Wrote shell scripts for rolling day-to-day processes and it is automated.
Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.

Environment: Hadoop, Spark, MapReduce, HDFS, Flume, Sqoop, Hive, Zookeeper, Pig, Horton works, Oozie, Elastic Search, NoSQL, UNIX/LINUX.

Confidential, Houston, TX

Hadoop Consultant

Responsibilities:

Obtained the requirement specifications from the SME’s, Business Analysts in the BR, and SR meetings for corporate workplace project. Interacted with the Business users to build the sample report layouts.
Involved in writing the HLD’s along with the RTM’s tracing back to the corresponding BR’s and SR’s and reviewed them with the Business.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Wrote MapReduce programs in Java to achieve the required Output.
Created Hive Tables and Hive scripts to automate data management.
Worked on debugging, performance tuning of Hive & Pig Jobs
Performed cluster co-ordination through Zookeeper.
Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
Created POC to store Server Log data in MongoDB to identify System Alert Metrics.
Installed and configured Apache Hadoop and Hive/Pig Ecosystems.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Worked on debugging, performance tuning of Hive Jobs.
Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
Involved in loading data from LINUX and UNIX file system to HDFS.
Installed and configured Hive and wrote Hive UDFs for transforming and loading data.
Created Hive Tables and Hive scripts to automate data management.
Created HBase tables to store various data formats of PII data coming from different portfolios.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.

Environment: Hadoop, Oracle, HiveQL, Pig, Flume, MapReduce, Zookeeper, HDFS, Hbase, MongoDB, PL/SQL, Windows, Linux.

Confidential, Dallas, TX

J2EE Developer

Responsibilities:

Involved in Documentation and Use case design using UML modeling including development of Class diagrams, Sequence diagrams, and Use case Transaction diagrams.
Implemented an agile client delivery process, including automated testing, pair programming, and rapid prototyping.
Involved in developing EJB (Stateless Session Beans) for implementing business logic.
Involved in working with JMS Queues.
Accessed and Manipulated XML documents using XML DOM Parser.
Deployed the EJBs on JBoss Application Server.
Involved in developing Status and Error Message handling.
Used Web services SOAP protocol to transfer XML messages from one environment to other.
Hibernate is used to persist data to an Oracle 10g database.
Implemented various HQL queries to access the database through application work flow.
Involved in writing Junit Test Cases using Junit testing framework.
Used Log4j for External Configuration Files and debugging.
Involved in Unit, Integration and Performance Testing for the new enhancements.

Environment: Java, JDK, WSAD, Hibernate, Junit, EJB, JSP, Spring MVC, JMS, XML, XSLT, XML Parsers (DOM), JBoss, Web Services, HTML, JavaScript, Oracle and Windows XP.

Confidential, Columbus, OH

Java Developer

Responsibilities:

Involved in requirement gathering, functional and technical specifications.
Monitoring and fine tuning IDM performance and Enhancements in the self-registration process.
Developed OMSA GUI using MVC architecture, Core Java, Java Collections, JSP, JDBC, Servlets, ANT and XML within a Windows and UNIX environment.
Used Java Collection Classes like Array List, Vectors, Hash Map and Hash Table.
Used Design Patterns MVC, Singleton, Factory, Abstract Factory.
Wrote requirements and detailed design documents, designed architecture for data collection.
Developed algorithms and coded programs in Java.
Involved in design and implementation using Core Java, Struts, and JMS
Performed all types of testing includes Unit testing, Integration and testing environments.
Worked on a modifying an existing JMS messaging framework for increased loads and performance optimizations.

Environment: JAVA, Design Patterns, Oracle, SQL/ PL SQL, JMS.

We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

San Jose, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship