Hadoop Developer Resume Cambridge, MA - Hire IT People

SUMMARY

A Dynamic Professional with 8+ years of experience in IT industry which includes 4+years of solid software development experience for mission critical, data - intensive applications in Big Data and Hadoop.
Experience in developing applications that perform large scale Distributed Data Processing using Hadoop, MapR, Pig, Hive, Sqoop, Oozie, Java, Spark, Storm, Hbase, Cassandra, Kafka, Zookeeper and Flume.
Excellent understanding and knowledge of Big Data and Hadoop architecture.
In-Depth knowledge and experience in design, development and deployments of Big Data projects using Hadoop / Data Analytics / NoSQL / Distributed Machine Learning frameworks.
Good understanding / knowledge of Spark architecture and various components such as Spark streaming, Spark SQL, Spark R programming paradigm.
Solid experience using YARN and tools like Pig and Hive for data analysis, Sqoop for data ingestion, Oozie for scheduling and Zookeeper for coordinating cluster resources.
Excellent understanding on CDH, HDP, Pivotal, MapR and Apache Hadoop distributions.
Good understanding of HDFS Designs, Daemons, HDFS high availability (HA), HDFS Federation.
Experience in analyzing data using Pig Latin, HiveQL, Hbase and custom MapReduce programs in Java.
Excellent understanding of Hadoop architecture and different demons of Hadoop clusters which include Resource Manager, Node Manager, Name Node and Data Node.
Experience in developing applications cloud computing applications using Amazon EC2, S3, EMR.
Excellent experience in working with high velocity Real-time data processing frameworks using tools like Kafka, Spark and Storm.
Expert in working with Hive data warehouse creating tables, data distribution by implementing Partitioning and Bucketing.
Expertise in implementing Ad-hoc queries using Hive QL and writing Pig Latin scripts to sort, group, join and filter the data as part of data transformation as per the business requirements.
Experienced with different kind of compression techniques to save data and optimize data transfer over networks using Lzo, Snappy, Gzip, Bzip2 etc.
Experience with NoSQL databases like Hbase, Cassandra, MongoDB and Couchbase as well as other ecosystems like ZooKeeper, Oozie, Impala, Storm, Spark Streaming, SparkSQL, Kafka and Flume.
Extending HIVE and PIG core functionality by using custom UDFs in Java.
Experience in importing and exporting data using Sqoop from HDFS (Hive & Hbase) to Relational Database Systems (Oracle, Mysql, DB2, Informix, TeraData) and vice-versa.
Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs using Oozie-Coordinator.
Good understanding on SQL database concepts and data warehouse Technologies like Talend.
Extensive experience in Requirements gathering, Analysis, Design, Reviews, Coding and Code Reviews, Unit and Integration Testing.
Extensive experience using Java, JEE, J2EE design Patterns like Singleton, Factory, MVC, Front Controller, for reusing most effective and efficient strategies.
Expertise in using IDE like WebSphere (WSAD), Eclipse, NetBeans, My Eclipse, WebLogic Workshop.
Experience in developing service components using JDBC.
Experience in developing and designing Web Services (SOAP and Restful Web services).
Good amount of experience in developing applications using SCRUM methodology.
Good understanding and experience with Software Development methodologies like Agile and Waterfall.
Excellent communication, analytical skills and flexible to learn new technologies in the IT industry towards company’s success.

TECHNICAL SKILLS

Languages/Tools: Java, XML, XSTL, HTML/XHTML, HDML, DHTML, Python, Scala, R, GIT.

Big Data Technologies: Apache Hadoop, HDFS, Spark, HIVE, PIG, Talend, Hbase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Kafka, Storm, Impala.

Java Technologies: JSE: JAVA architecture, OOPs concepts JEE:JDBC, JNDI, JSF(Java Server Faces), Spring, Hibernate, SOAP/Rest web services

Web Technologies: HTML, XML, Java Script, WSDL, Soap, JSON, angular JS

Databases/NO SQL: MS SQL Server, MySQL, Hbase, Oracle, MS Access, Teradata, oracle, Netezza, Cassandra, Greenplum, MongoDB.

PROFESSIONAL EXPERIENCE

Confidential, Cambridge, MA

Hadoop Developer

Responsibilities:

Maintained System integrity of all sub-components (primarily HDFS, MR, Hbase, and Hive).
Migrating the needed data from MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
Involved in creating mappings, sessions and workflows. Used all kind of Informatica transformations.
Developed multiple Kafka Producers and Consumers from base by using low level and high level API's and implementing.
Involved in moving data from Hive tables into Cassandra for real time analytics on Hive tables
Create, alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra
Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
HiveQL scripts to create, load, and query tables in a Hive.
Analyzed, developed and implemented the ETL architecture using Erwin &Informatica.
Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.
Used all kinds ofInformaticatransformations to load all kinds of data and to automate the process.
Supported Map Reduce Programs those are running on the cluster
Monitored System health and logs and respond accordingly to any warning or failure conditions.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
Implemented Spark using Scardsla and Spark SQL for faster testing and processing of data.
Real time streaming the data using Spark with Kafka.
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala
Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
Strongly recommended to bring in Elastic Search and was responsible for installing, configuring and administration.
Developing and maintaining efficient ETL Talend jobs for Data Ingest.
Worked on Talend RTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
Involved in migration Hadoop jobs into higher environments like SIT, UAT and Prod.

Environment: Hortonworks, HDFS, Hive, HQL scripts, Scala, Map Reduce, Storm, Spark, Java, Hbase, Cassandra, Pig, Sqoop, Shell Scripts, Oozie Co-ordinator, MySQL, Tableau, Elastic search, Talend, Informatica and SFTP.

Confidential, Indianapolis, IN

Hadoop Developer

Responsibilities:

Coordinated with business customers to gather business requirements. And also interacted with other technical peers to derive Technical requirements.
Developed Sqoop jobs to import and store massive volumes of data in HDFS and Hive.
Developed MapReduce, Pig and Hive scripts to cleanse, validate and transform data.
Worked with performance issues and tuning the Pig and Hive scripts.
Orchestrated Oozie workflow engine to run multiple Hive and Pig jobs.
Designed and developed PIG data transformation scripts to work against unstructured data from various data points and created a base line.
Involved in implementing Spark RDD transformations, actions to implement business analysis.
Worked on creating and optimizing Hive scripts for data analysts based on the requirements.
Created Hive UDFs to encapsulate complex and reusable logic for the end users.
Experienced in working with Sequence files and compressed file formats.
Converted the raw files (CSV, TSV) to different file formats like Parquet and Avro with datatype conversion using cascading.
Developed the code for removing or replacing the error fields in the data fields using cascading.
Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
Monitored the cascading flow using the Driven component to ensure the desired result was obtained.
Developed a data pipeline using Kafka and Storm to store data into HDFS.
Populated HDFS and Cassandra with huge amounts of data using Apache Kafka
Developed some utility helper classes to get data from Hbase tables.
Monitored workload, job performance and capacity planning using Cloudera Manager.

Environment: Cloudera, Linux (CentOS, RedHat), UNIX Shell, Pig, Hive, MapReduce, YARN, Apache Spark, Eclipse, Core Java, JDK1.7, Oozie Workflows, AWS, EMR, HBASE, Cassandra, SQOOP, Scala, Kafka.

Confidential, Tampa, FL

Hadoop Developer

Responsibilities:

Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS.
Applied Hive quires to perform data analysis on Hbase using Storage Handler in order to meet the business requirements.
Created components like Hive UDFs for missing functionality in HIVE for analytics.
Hands on experience with NoSQL databases like Hbase.
Developing Scripts and Batch Job to schedule abundle (group of coordinators) which consists of various Hadoop Programs using Oozie
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Involved in ETL, Data Integration and Migration.
Developing custom aggregate functions using Spark SQL and performed interactive querying.
Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts
Installed and configured Hadoop, Mapreduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Worked on ETL Informatica for parsing the data, and then the parsed data is loaded to HDFS.
Responsible for building Scalable distributed data solutions using Hadoop.
Assisted in setting up Horton works cluster and installing all the ecosystem components through Ambari and manually from command line.
Developed scripts for tracking the changes in file permissions of the files and directories through audit logs in HDFS.
Implemented test scripts to support test driven development and continuous integration.
Involved in loading data from UNIX file system to HDFS.
Load and transform large sets of structured, semi structured and unstructured data.
Balancing HDFS manually to decrease network utilization and increase job performance.
Set up automated processes to archive/clean the unwanted data on the cluster, in particular on HDFS and Local file system.

Environment: Cloudera, HDFS, Hive, Pig, Sqoop, LINUX, Hbase, Tableau, Informatica, Micro strategy, Shell Scripting, Ubuntu, RedHat Linux.

Confidential, Tampa, FL

Hadoop Developer

Responsibilities:

Developed data pipeline using Flume, Sqoop, Pig and Map Reduce to ingest customer behavioural data and purchase histories into HDFS for analysis.
Used Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
Loaded the aggregated data onto DB2 for reporting on the dashboard.
Monitoring and Debugging Hadoop jobs/Applications running in production.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Building, packaging and deploying the code to the Hadoop servers.
Moving data from Oracle to HDFS and vice-versa using SQOOP
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Worked with different file formats and compression techniques to determine standards
Developed Hive scripts for implementing control tables logic in HDFS.
Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE
Created Hbase tables to store various data formats of data coming from different portfolios and data processing using SPARK.
Cluster co-ordination services through ZooKeeper.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Hands on writing Map Reduce code to make unstructured data as structured data and for inserting data into HBase from HDFS.
Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.

Environment: JDK, Ubuntu Linux, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, MongoDBZookeeper, Hbase, Java, Shell Scripting, Informatica, Cognos, SQL, Teradata.

Confidential

Java/J2EE Developer

Responsibilities:

Analysis of system requirements and development of design documents.
Development of Spring Services.
Development of persistence classes using Hibernate framework.
Development of SOA services using Apache Axis webservice framework.
Development of user interface using Apache Struts2.0, JSPs, Servlets, JQuery, HTML and Java Script.
Developed client functionality using ExtJS.
Development of JUnit test cases to test business components.
Extensively used Java Collection API to improve application quality and performance.
Vastly used Java 5 features like Generics, enhanced for loop, type safe etc.
Providing production support and enhancements design to the existing product.

Environment: s: Java 1.5, SOA, Spring, ExtJS, Struts 2.0, Servlets, JSP, GWT, JQuery, JavaScript, CSS, Web Services, XML, Oracle, Weblogic Application Server, Eclipse, UML, Microsoft Vision.

Confidential

Java Developer

Responsibilities:

Involved in the analysis, design, implementation, and testing of the project.
Implemented the presentation layer with HTML, XHTML and JavaScript.
Involving in creation object model to relational using Hibernate.
Developed web components using JSP, Servlets and JDBC.
Implemented database using SQL Server.
Consumed Web Services for transferring data between different applications.
Wrote complex SQL and stored procedures.
Involved in fixing bugs and unit testing with test cases using JUnit.
Developed user and technical documentation.

Environment: Java, SQL, Servlets, HTML, XML, Hibernate, JavaScript, spring, Hibernate.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Cambridge, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship