Hadoop Consultant Resume
San Jose, CA
SUMMARY:
- Over 7+ years of professional IT experience with Big Data Technology including Hadoop/YARN, Pig, Hive, Hbase, Cassandra and Spark.
- Hands on experience with Apache Spark, Spark SQL and Spark Streaming.
- Worked with different distributions of Hadoop and Big Data technologies including Hortonworks and Cloudera.
- Expertise in Big Data Hadoop Ecosystem like Flume, Hive, Cassandra, Sqoop, Oozie, Zookeeper, Kafka etc.
- Well versed with Developing and Implementing MapReduce programs using Java and Python.
- Familiarity with NoSQL databases like Hbase and Cassandra.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
- Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
- Experience in NoSQL database MongoDB and Cassandra.
- Familiarity on real time streaming data with Spark and Kafka.
- Detailed knowledge and experience of Design, Development and Testing Software solutions using Java and J2EE technologies.
- Strong understanding of Data warehouse concepts, ETL, data modeling experience using Normalization, Business Process Analysis, Reengineering, Dimensional Data modeling, physical & logical data modeling.
- Experience using middleware architecture using Sun Java technologies like J2EE, JSP, Servlets, and application servers like Web Sphere and Web logic.
- Familiarity working with popular frameworks likes Struts, Hibernate, Spring MVC and AJAX.
- Experience in Object Oriented language like Java and Core Java.
- Experience in creating web - based applications using JSP and Servlets.
- Experience in Database design, Entity relationships, Database analysis, Programming SQL, PL/SQL, Packages and Triggers in Oracle and SQL Server on Windows and LINUX.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
- Research-oriented, motivated, proactive, self-starter with strong technical, analytical and interpersonal skills.
- Experience in production support and application support by fixing bugs.
- Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
TECHNICAL SKILLS:
Big Data Technologies: Spark, Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Zookeeper, and Cloudera.
Scripting Languages: Python, Shell
Programming Languages: Java, Scala, C, C++
Web Technologies: HTML, J2EE, CSS, JavaScript, Servlets, JSP, XML Frameworks: Struts, Spring, Hibernate
Application Server: IBM WebSphere Server, Apache Tomcat.
DB Languages: SQL, PL/SQL
Databases /ETL: Oracle 9i/10g/11g
NoSQL Databases: Hbase, Cassandra, ElasticSearch, MongoDB.
Operating Systems: Linux, UNIX
PROFESSIONAL EXPERIENCE:
Confidential, San Jose, CA
Hadoop Consultant
Responsibilities:
- Developed Spark SQL jobs that read data from Data Lake using Hive transform and save it in Hbase.
- Built Java client that is responsible for receiving XML file using REST call and publishing it to Kafka.
- Built Kafka + Spark streaming job that is responsible for reading XML file messages from Kafka and transforming it to POJO using JAXB.
- Built Spark + Drools integration that lets us develop Drools rules as part of Spark streaming job.
- Built Hbase DAO’s that responsible for querying data that drools needs from Hbase.
- Built logic to publish output of Drools rules to Kafka for further processing.
- Wrote MapReduce jobs to generate reports for the number of activities created on a particular day, during a dumped from the multiple sources and the output was written back to HDFS.
- Worked on Oozie workflow, cron job.
- Cluster coordination services through Zookeeper.
- Worked with Sqoop for importing and exporting data between HDFS and RDBMS systems.
- Designed a data warehouse using Hive. Created partitioned tables in Hive.
- Developed the Hive UDF to pre-process the data for analysis.
- Analyzed the data by performing Hive queries and running Pig scripts to know Artist behavior.
- Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
- Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a Map reduce way.
- Exported data from DB2 to HDFS using Sqoop and NFS mount approach.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Moved data from Hadoop to Cassandra using Bulk output format class.
- Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
- Automated the work flow using shell scripts.
- Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
- Automated the work flow using shell scripts.
- Performance tuning of the hive queries, written by other developers.
Environment: Hadoop, HDFS, Hive, Spark, Spark SQL, Spark Streaming, Kafka, Hbase, Map Reduce, Pig, Oozie, Sqoop, REST, OpenShift, Zookeeper, Cassandra, Drools.
Confidential, Walnut Creek, CA
Hadoop Consultant
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Developed Sqoop jobs for extracting data from different databases, for both initial and incremental data load
- Developed MapReduce jobs for cleaning up the ingested data, as well as calculating computed fields.
- Designed Hive external tables for storing data extracted using Sqoop.
- Developed Hive jobs for moving data from Avro to ORC format, ORC format was used to speed up the queries
- Created Hive External tables for derived data and loaded the data into tables and query data using HQL for calculating the claim fraud flags.
- Created Python scripts for data cleaning of output from Hive &HBase queries
- Designed Hive External tables with ElasticSearch as Storage format for storing the results of claim flag calculation
- Implemented the workflows using Apache Oozie framework to orchestrate end to end execution.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the mapreduce jobs given by the users.
- Exported analyzed data using Sqoop for generating reports.
- Extensively used Pig for data cleansing. Developed Hive scripts to extract the data from the web server output files.
- Worked on data lake concepts, converted all ETL jobs into pig/hive scripts.
- Participated in the Oracle Golden gate POC that would be used for bringing CDC changes to Hadoop using Flume.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
- Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hadoop, Spark, MapReduce, HDFS, Flume, Sqoop, Hive, Zookeeper, Pig, Horton works, Oozie, Elastic Search, NoSQL, UNIX/LINUX.
Confidential, Houston, TX
Hadoop Consultant
Responsibilities:
- Obtained the requirement specifications from the SME’s, Business Analysts in the BR, and SR meetings for corporate workplace project. Interacted with the Business users to build the sample report layouts.
- Involved in writing the HLD’s along with the RTM’s tracing back to the corresponding BR’s and SR’s and reviewed them with the Business.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Wrote MapReduce programs in Java to achieve the required Output.
- Created Hive Tables and Hive scripts to automate data management.
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Performed cluster co-ordination through Zookeeper.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Created POC to store Server Log data in MongoDB to identify System Alert Metrics.
- Installed and configured Apache Hadoop and Hive/Pig Ecosystems.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked on debugging, performance tuning of Hive Jobs.
- Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
- Involved in loading data from LINUX and UNIX file system to HDFS.
- Installed and configured Hive and wrote Hive UDFs for transforming and loading data.
- Created Hive Tables and Hive scripts to automate data management.
- Created HBase tables to store various data formats of PII data coming from different portfolios.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
Environment: Hadoop, Oracle, HiveQL, Pig, Flume, MapReduce, Zookeeper, HDFS, Hbase, MongoDB, PL/SQL, Windows, Linux.
Confidential, Dallas, TX
J2EE Developer
Responsibilities:
- Involved in Documentation and Use case design using UML modeling including development of Class diagrams, Sequence diagrams, and Use case Transaction diagrams.
- Implemented an agile client delivery process, including automated testing, pair programming, and rapid prototyping.
- Involved in developing EJB (Stateless Session Beans) for implementing business logic.
- Involved in working with JMS Queues.
- Accessed and Manipulated XML documents using XML DOM Parser.
- Deployed the EJBs on JBoss Application Server.
- Involved in developing Status and Error Message handling.
- Used Web services SOAP protocol to transfer XML messages from one environment to other.
- Hibernate is used to persist data to an Oracle 10g database.
- Implemented various HQL queries to access the database through application work flow.
- Involved in writing Junit Test Cases using Junit testing framework.
- Used Log4j for External Configuration Files and debugging.
- Involved in Unit, Integration and Performance Testing for the new enhancements.
Environment: Java, JDK, WSAD, Hibernate, Junit, EJB, JSP, Spring MVC, JMS, XML, XSLT, XML Parsers (DOM), JBoss, Web Services, HTML, JavaScript, Oracle and Windows XP.
Confidential, Columbus, OH
Java Developer
Responsibilities:
- Involved in requirement gathering, functional and technical specifications.
- Monitoring and fine tuning IDM performance and Enhancements in the self-registration process.
- Developed OMSA GUI using MVC architecture, Core Java, Java Collections, JSP, JDBC, Servlets, ANT and XML within a Windows and UNIX environment.
- Used Java Collection Classes like Array List, Vectors, Hash Map and Hash Table.
- Used Design Patterns MVC, Singleton, Factory, Abstract Factory.
- Wrote requirements and detailed design documents, designed architecture for data collection.
- Developed algorithms and coded programs in Java.
- Involved in design and implementation using Core Java, Struts, and JMS
- Performed all types of testing includes Unit testing, Integration and testing environments.
- Worked on a modifying an existing JMS messaging framework for increased loads and performance optimizations.
Environment: JAVA, Design Patterns, Oracle, SQL/ PL SQL, JMS.