We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Cumming, GA

SUMMARY:

  • Around 8 years of experience in several fields in IT including 4 years of experience in bigdata Ecosystem related technologies.
  • Expertise with tools in Hadoop Ecosystem including HDFS, MapReduce, Hive, Sqoop, Pig, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Worked on major Hadoop distributions like Cloudera, MapR and Horton Works.
  • Experience in processing structured, Semi - Structured and Unstructured Data using different tools and frameworks in Hadoop Ecosystem.
  • Have good knowledge on NoSQL databases like HBase, Cassandra and MongoDB.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Analysis on integrating Kibana with Elastic Search.
  • Experience in using different components of Spark like Spark Streaming to process real-time data as well as historical data.
  • Knowledge on Cloud technologies like AWS Cloud and Amazon Elastic Map Reduce (EMR).
  • Proficient in using OOPS Concepts and Java concepts such as Generics, Multi-threading and Collections necessary for writing Map Reduce Jobs.
  • Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g, SQL Server and MySQL.
  • Experience in Enterprise Service Bus(ESB) such as WebSphere Message Broker.
  • Experience with code development frameworks - GitHub, BitBucket.
  • Expertise with Application servers and web servers like WebLogic, IBM WebSphere, Apache Tomcat, and VMware.
  • Deployed and monitored scalable infrastructure on cloud environment Amazon web services (AWS)
  • Comprehensive knowledge of Software Development Life Cycle (SDLC), having thorough understanding of various phases like Requirement, Analysis, Design, Development and Testing.
  • Involved in the Software Development methodologies like Agile and Waterfall estimating the timelines for projects.
  • Ability to quickly master new concepts and applications.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop (HDFS & MapReduce), PIG, HIVE, HBASE, ZOOKEEPER, Sqoop, Flume, Kafka, Spark, Spark Streaming, Spark SQL and Data Frames, Graph X, Scala, Elastic Search and AWS

Programming & Scripting Languages: Java, C, SQL, Python, Impala, Scala, C++, ESQL, PHP

J2EE Technologies: JSP, SERVLETS, EJB, Angular JS

Web Technologies: HTML, JavaScript

Frameworks: Spring 3.5 - Spring MVC, Spring ORM, Spring Security, Spring ROO, Hibernate, Struts

Application Servers: IBM Web Sphere, JBoss, WebLogic

Web Servers: Apache Tomcat

Databases: MS SQL Server & SQL Server Integration Services (SSIS), My SQL, MongoDB, Cassandra, Oracle DB, Teradata

IDEs: Eclipse, Net Beans

Operating System: Unix, Windows, Ubuntu, Cent OS

Others: Putty, WinSCP, DataLake, Talend, Tableau, GitHub.

PROFESSIONAL EXPERIENCE:

Hadoop/Spark Developer

Confidential, Cumming, GA

Responsibilities:

  • Importing and Exporting huge chunks of data between HDFS and RDBMS by making use of Sqoop.
  • Performing Extract, Transform and Load (ETL) processes using Hive
  • Importing data stored in Amazon Web Services into HDFS
  • Providing solutions to ad hoc client requests for data and experienced in creating ad hoc reports.
  • Creating Hive Tables, loading with data and writing Hive queries.
  • Implemented dynamic Partitions and Bucketing in Hive for efficient data access.
  • Implementing several workflows using Apache Oozie framework to automate day-to-day Sqoop tasks.
  • Writing Hive jobs on processed data to parse and structure logs, manage and query data using HiveQL to facilitate effective querying.
  • Used Zookeeper to co-ordinate and run different cluster services.
  • Making use of Apache Impala wherever possible in place of Hive while analyzing data to achieve faster results.
  • Involved in converting Hive queries into spark transformations using Spark RDDs in Scala.
  • Experienced with Spark Context, Spark-SQL, Data Frame, RDDs’ and YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Developed and Configured Kafka brokers to pipeline server logs data into spark streaming for real time processing.
  • Analyzing large datasets to find patterns and insights within structured and unstructured data to help business intelligence with the help of Tableau.
  • Coordinating with teams to resolve any errors and problems that arise technically as well as functionally.

Environment: Hadoop, Cloudera, Pig, Hive, Sqoop, Flume, Kafka, Spark, Storm, Tableau, HBase, Scala, Kerberos, Agile, Zookeeper, Maven, AWS, MySQL.

Hadoop Developer

Confidential, Frisco, TX

Responsibilities:

  • Design, implementation and deployment of Hadoop cluster.
  • Providing solutions based on issues using big data analytics.
  • Part of the team that built scalable distributed data solutions using Hadoop cluster environment using Horton Works distribution.
  • Loading data into the Hadoop distributed file system (HDFS) with the help of Kafka and REST API
  • Worked on Sqoop to load data into HDFS from Relational Database Management Systems.
  • Implementation of Talend jobs to load and integrate data from excel sheets using Kafka.
  • Developed custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes of data as per the requirement.
  • Worked on ORC, Avro file formats and some compression techniques like LZO.
  • Used Hive on top of structured data to implement dynamic partitions and bucketing of Hive tables.
  • Carried out transforming huge data of Structured, Semi-Structured and Unstructured types and analyzing them using Hive queries and Pig scripts.
  • Experienced in using Hadoop YARN as execution engine for data analytics using Hive.
  • Worked with MongoDB for developing and implementing programs in Hadoop Environment.
  • Based on necessity, used job management scheduler Apache Oozie to execute the workflow.
  • Implemented Ambari to keep track of node status, job progression and running analytical jobs in Hadoop clusters
  • Expertise in Tableau to build customized graphical reports, charts and worksheets.
  • Filter the dataset with PIG UDF, PIG scripts in HDFS and Storm/Bolt in Apache Storm.

Environment: Hadoop, Pig, Hive, HBase, Sqoop, Python, Oozie, Zookeeper, RHEL, Java, Eclipse, SQL, NoSQL, Talend, Tableau, MongoDB.

Hadoop Developer

Confidential, Portland, ME

Responsibilities:

  • Used Hadoop Cloudera Distribution. Involved in all phases of the Big Data Implementation including requirement analysis, design and development of Hadoop cluster.
  • Collecting and aggregating huge sets of data using Apache Flume, staging data in Hadoop storage system HDFS for further analyzation.
  • Design, build and support pipelines of data ingestion, transformation, conversion and validation.
  • Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
  • Performing different types joins on hive tables along with experience in partitioning, bucketing and collection concepts in Hive for efficient data access.
  • Managing data between different databases like ingesting data into Cassandra and consuming the ingested data to Hadoop.
  • Creating Hive external tables to perform Extract, Transform and Load (ETL) operations on data that is generated on a daily basis.
  • Creating HBase tables for random queries as requested by the business intelligence and other teams.
  • Created final tables in Parquet format. Use of Impala to create and manage Parquet tables.
  • Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
  • Worked on NoSQL databases including HBase and Cassandra.
  • Participated in development/implementation of Cloudera impala Hadoop environment.
  • Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
  • Developed the data model to manage the summarized data.

Environment: Hadoop, Cloudera Hive, Java, Python, Parquet, Oozie, Cassandra, Zookeeper, HiveQl/SQL, MongoDB, Tableau, Impala.

Network Engineer

Confidential

Responsibilities:

  • Establishing networking environment by designing system configuration and directing system installation.
  • Enforcing system standards and defining protocols.
  • Maximizing network performance by monitoring and troubleshooting network problems and outages.
  • Setting up policies for data security and network optimization
  • Maintaining the clusters needed for data processing especially for big data where a bunch of servers are setup on a complex yet efficient network.
  • Reporting network operational status by gathering, filtering and prioritizing information necessary for the optimal network upkeep.
  • Keeping the budget low by efficiently making use of the available resources and tracking the data transfer speeds and processing speeds
  • Tracking Budget Expenses, Project Management, Problem Solving, LAN Knowledge, Proxy Servers, Networking Knowledge, Network Design and Implementation, Network Troubleshooting, Network Hardware Configuration, Network Performance Tuning, People Management

Environment: Wireshark, GNS3, Hadoop, IP addressing, VPN, VLAN, Network Protocols.

Jr. Java Developer

Confidential

Responsibilities:

  • Analyzing requirements and specifications in Agile based environment.
  • Development of web interface for User module and Admin module using JSP, HTML, XML, CSS, JavaScript, AJAX, and Action Servlets with Struts Framework.
  • Extensively worked on CORE JAVA (Collections of Generics and Templates, Interfaces for passing the data from GUI Layer to Business Layer).
  • Analyzation and design of the system based on OOAD principle.
  • Used WebSphere Application Server to deploy the build.
  • Development, Testing and Debugging of the developed application in Eclipse.
  • Used DOM Parser to parse the XML files.
  • Log4j framework has been used for logging debug, info & error data.
  • Used Oracle 10g Database for data persistence.
  • Transferring of files from local system to other systems is done using WinSCP.
  • Performed Test Driven Development (TDD) using JUnit.

Environment: J2EE, HTML, JSON, JavaScript, CSS, Struts, Spring, Hibernate, Eclipse, Oracle 10g, SQL, XML, CVS, HTML.

Hire Now