We provide IT Staff Augmentation Services!

Associate Big Data Analyst Resume

Shrewsbury, MA


A Big Data Architect with 3 years of experience, which includes experience in the Big Data ecosystem related technologies with knowledge in Big Data infrastructure, distributed file systems - HDFS, parallel processing - Map Reduce Framework and complete Hadoop ecosystem - Hive, Pig, Scoop, HBase, Flume and Oozie. Well versed with Java, Python and R languages, working with the Elastic Stack and front-end technologies like HTML, CSS and Java Script. With a zeal to learn, ability to work hard, commitment to task and willingness to to be a team member as well as be able to work in an independent work environment I am consistent at following directions and adding value to the project at hand.


Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, Impala, Sqoop, Flume, Oozie, Kafka, Spark

NOSQL Databases: HBase, Cassandra

Programming Languages: Java, C, C++, R, Python

Web Technologies: HTML, J2EE, CSS, JavaScript

Databases: MySQL, SQL, Oracle, SQL Server, HBase

Operating System: Linux, Windows 7, Windows 8, XP, windows vista

Work Environments: AWS, Docker, Eclipse, Visual Studio .NET, JUnit, Log4j, Putty, WP



Confidential, SHREWSBURY, MA


  • Successfully installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, and Sqoop.
  • Built Spark Scripts, and responsible for performance tuning of spark applications. Used memory computing capabilities of Spark and performed advanced procedures like text analytics and processing.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Implemented Name Node backup using NFS for High availability.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS at high level optimization.
  • Responsible for developing data pipeline using HDInsight, Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
  • Configuring, running and using high performance cloud platforms (AWS, Docker, etc,)
  • Performed highly time oriented jobs on Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Rigorous use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Exported the analyzed data to the relational databases using Sqoop for analyzation, visualization and generating reports.
  • Implementing Hive Partitioning and Bucketing for organizing and cleaning data at a large scale.
  • Migrating data from different databases (SQL and noSQL) to HDFS using different pipelines.
  • Designing rowkeys with high usability in HBase preventing Hotspotting on High performance clusters.
  • Performing high level HQL queries on External tables created using Hive for data wrangling.
  • Extensive use of Shell scripting for data pulling, data cleaning for processed data to migrate.
  • Adeptly using Fully Distributed and Pseudo-distributed modes for building POCs and full built projects.

Environment: Hadoop, Map Reduce, Hive, HDFS, PIG, Sqoop, Oozie, Spark- Scala, Kafka, Flume, HBase, Oracle, SQL, NoSQL and Unix/Linux


Confidential, SHREWSBURY, MA


  • Worked on Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables

Environment: Hadoop, MapReduce, HDFS, Hive, Spark- Scala, Kafka, Java (jdk1.6), Hadoop distribution of Hortonworks, Cloudera, MapR, DataStax, IBM DataStage 8.1(Designer, Director, Administrator), PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting

Hire Now