Associate Big Data Analyst Resume
Shrewsbury, MA
SUMMARY:
A Big Data Architect with 3 years of experience, which includes experience in the Big Data ecosystem related technologies with knowledge in Big Data infrastructure, distributed file systems - HDFS, parallel processing - Map Reduce Framework and complete Hadoop ecosystem - Hive, Pig, Scoop, HBase, Flume and Oozie. Well versed with Java, Python and R languages, working with the Elastic Stack and front-end technologies like HTML, CSS and Java Script. With a zeal to learn, ability to work hard, commitment to task and willingness to to be a team member as well as be able to work in an independent work environment I am consistent at following directions and adding value to the project at hand.
TECHNICAL SKILLS:
Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, Impala, Sqoop, Flume, Oozie, Kafka, Spark
NOSQL Databases: HBase, Cassandra
Programming Languages: Java, C, C++, R, Python
Web Technologies: HTML, J2EE, CSS, JavaScript
Databases: MySQL, SQL, Oracle, SQL Server, HBase
Operating System: Linux, Windows 7, Windows 8, XP, windows vista
Work Environments: AWS, Docker, Eclipse, Visual Studio .NET, JUnit, Log4j, Putty, WP
PROFESSIONAL EXPERIENCE:
ASSOCIATE BIG DATA ANALYST
Confidential, SHREWSBURY, MA
Responsibilities:
- Successfully installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, and Sqoop.
- Built Spark Scripts, and responsible for performance tuning of spark applications. Used memory computing capabilities of Spark and performed advanced procedures like text analytics and processing.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
- Implemented Name Node backup using NFS for High availability.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS at high level optimization.
- Responsible for developing data pipeline using HDInsight, Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
- Configuring, running and using high performance cloud platforms (AWS, Docker, etc,)
- Performed highly time oriented jobs on Oozie workflow engine to run multiple Hive and Pig Jobs.
- Rigorous use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Exported the analyzed data to the relational databases using Sqoop for analyzation, visualization and generating reports.
- Implementing Hive Partitioning and Bucketing for organizing and cleaning data at a large scale.
- Migrating data from different databases (SQL and noSQL) to HDFS using different pipelines.
- Designing rowkeys with high usability in HBase preventing Hotspotting on High performance clusters.
- Performing high level HQL queries on External tables created using Hive for data wrangling.
- Extensive use of Shell scripting for data pulling, data cleaning for processed data to migrate.
- Adeptly using Fully Distributed and Pseudo-distributed modes for building POCs and full built projects.
Environment: Hadoop, Map Reduce, Hive, HDFS, PIG, Sqoop, Oozie, Spark- Scala, Kafka, Flume, HBase, Oracle, SQL, NoSQL and Unix/Linux
ASSOCIATE DATA ANALYST
Confidential, SHREWSBURY, MA
Responsibilities:
- Worked on Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables
Environment: Hadoop, MapReduce, HDFS, Hive, Spark- Scala, Kafka, Java (jdk1.6), Hadoop distribution of Hortonworks, Cloudera, MapR, DataStax, IBM DataStage 8.1(Designer, Director, Administrator), PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting