We provide IT Staff Augmentation Services!

Big Data Analyst Resume

4.00/5 (Submit Your Rating)

Pasadena, CA

SUMMARY:

  • Overall 8+ years of experience in Architect, Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects. Around 3+ years of experience in Big Data in implementing end - to-end Hadoop solutions.
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, Impala, HBASE, ZOOKEEPER, SQOOP, Hue, JSON.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, Pig.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • Experience with Storm and Kafka for the real time processing of data.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRv1 and MRv2 (YARN).
  • Experience in analyzing data using Hive QL, Impala, Pig Latin, and custom MapReduce programs in Java.
  • Experience in job workflow scheduling and monitoring tools like Oozie, Zookeeper.
  • Scripting to deploy monitors checks and critical system admin functions automation.
  • Experience in Real-time streaming data using Kafka and Storm.
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Hands on experience in installing configuring and using Hadoop ecosystem components like Hadoop MapReduce HDFS HBase Hive Sqoop Pig Zookeeper and Flume.
  • Experience in optimization of Map reduce algorithm using combiners and partitioners to deliver the best results.
  • Good understanding of NoSQL databases like MongoDB, REDIS.
  • Very good experience in complete project life cycle design development testing and implementation of Client Server and Web applications.
  • Experience in using Apache Spark for Big Data and “Fast Data” (Streaming) platform.
  • Expertise in core Java, J2EE, Multithreading, JDBC, Hibernate, spring, Shell Scripting and proficient in using Java API’s for application development.
  • Proficient in Working with Various IDE tools including Eclipse Galileo, IBM Rational Application Developer (RAD) and IntelliJ IDEA.
  • Hands on experience in application development using Linux Shell scripting.
  • Experience in Database design, Entity relationships, Database analysis, Programming SQL, Stored procedure’s PL/ SQL.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Worked on various operating systems like UNIX/Linux, MAC OS and Windows.
  • Excellent interpersonal, analytical, verbal and written communications skills.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Pig, Hive, Spark, HBase, Sqoop, Oozie, Zookeeper, Scala, Hue, Kafka, Storm JSON.

Java & J2EE technologies: Core Java, JSP, JDBC

IDE Tools: Eclipse.

Programming languages: Java, Linux shell scripts.

Web Frameworks: Struts1.x, Struts 2.x, Spring3.x, Hibernate.

Database: Oracle 11g/10g/9i, DB2, PL/SQL, SQL Developer,Cassandra DB.

Web Technologies: HTML, XML, JavaScript.

Operating Systems: Windows 95/98/2000/XP, MAC OS, UNIX, LINUX.

PROFESSIONAL EXPERIENCE:

Confidential, Pasadena, CA

Big Data Analyst

Responsibilities:

  • Worked on importing and exporting data from Oracle and DB2 into HDFS using Sqoop.
  • Developed data pipeline using Flume to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both
  • Managed and External tables in Hive for optimized performance.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Used Kafka and Storm for real time data ingestion and processing.
  • Used Spark SQL, Spark Streaming for data streaming and analysis.
  • Prepared the best practices in writing map reduce programs and hive scripts.
  • Scheduled a workflow to import the weekly transactions in the revenue department from RDBMS database using Oozie.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Built wrapper shell scripts to hold theseOozie workflow.
  • Developed PIG Latin scripts to transform the log data files and load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Developed hive UDF for functions that were not preexisting in Hive like the rank etc.
  • Created External Hive tables and involved in data loading and writing Hive UDFs
  • Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
  • Developed Unit test cases using MR unit for map reduce code.
  • Involved in creating Hadoop streaming jobs.
  • Involved in Installing, Configuring Hadoop Eco System and Cloudera Manager using CDH4 Distribution .

Environment: Hadoop, HDFS, MapReduce,Hive, JSON, Oozie, Hue, Spring, Hibernate, Shell, REST Web Services and Java.

Confidential, Austin, TX

Hadoop Developer

Responsibilities:

  • Installed and configured HadoopMapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Involved in loading data from UNIX file system to HDFS using Flume and Kettle and HDFS API.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Involved in managing and reviewing Hadoop log files.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Load and transform large sets of structured, semi structured and unstructured data with map reduce, Impala and Pig.
  • Wrote pig UDF’s.
  • Developed HIVE queries for the analysts.
  • Configured and Maintained different topologies in Storm cluster and deployed them on regular basis for real time processing data.
  • Documented routine tasks like ETL process flow diagrams, mapping specs, source-target matrix and unit test documentation.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Created databases table, tables and views in Hive QL, Impala and Pig Latin.
  • Wrote functional and technical specifications for Solr, HBase, Hive and other components.
  • Exported the result set from HIVE to MySQL using Kettle (Pentaho data-integration tool).
  • Used Zookeeper for various types of centralized configurations.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map
  • Automated all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running MR jobs and PIG/Hive using Kettle and Oozie(Work Flow management).
  • Written Kafka API to collect events from front end.
  • Maintain System integrity of all sub-components (primarily HDFS, MRand Flume).
  • Writing unit test cases using MR Unit
  • Monitor System health and logs and respond accordingly to any warning or failure conditions.

Environment: Hadoop (Cloudera), HDFS, Map Reduce, Hive, Pig, Kafka, Storm, Sqoop, Solr, WebSphere, Struts, Hibernate, spring, Oozie, REST Web Services, Solaris, Db2, UNIX Shell Scripting.

Confidential, Mountain View, CA

Hadoop Developer

Responsibilities:

  • Extracted files from DB2 through Kettle and placed in HDFS and processed.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Building Hadoop-based big data enterprise platforms coding in python.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Involved in unit testing using MR unit for Map Reduce jobs.
  • Involved in loading data from LINUX file system to HDFS.
  • Responsible for managing data from multiple sources.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured data.
  • Responsible to manage data coming from different sources.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, LINUX, MRUnit and Big Data

Confidential

Java Developer

Responsibilities:

  • Develop new functionalities for the application based on Requirements and design of the application
  • Experience in Spring framework for developing light weight business components .
  • Data integration with backend and frontend.
  • Involved in Unit testing and Bug Fixing.
  • Created custom Hibernate User Type classes for enumerated constants and configured in entity classes.
  • Created interactive client side scripting in JavaScript, AJAX&JSON using JQueryJavaScript framework.
  • Interaction with Clients / Users to clear the requirements and design of application
  • Regression testing using JUnit Test Cases
  • Deployed the components to Tomcat 7.0 Server.
  • Developed JSF Tags and Components.
  • Made Single page module with use of Angular.js framework.
  • Fixed style and JavaScript issues pertaining to different browsers
Confidential

Java Developer

Responsibilities:

  • Involved in analysis, design and development of POS (Point of Sale) system and developed specs that include Use Cases, Class Diagrams, Sequence Diagrams and Activity Diagrams.
  • Involved in designing the user interfaces using JSP’s.
  • Developed custom tags, JSTL to support custom User Interfaces.
  • Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
  • Implemented Business processes such as user authentication.
  • Implemented code for JSP, Servlets, and Struts.
  • Used Spring Framework to support the Hibernate tool and Struts.

We'd love your feedback!