Hadoop/Spark Developer Resume Cumming, GA - Hire IT People

SUMMARY:

Around 8 years of experience in several fields in IT including 4 years of experience in bigdata Ecosystem related technologies.
Expertise with tools in Hadoop Ecosystem including HDFS, MapReduce, Hive, Sqoop, Pig, Spark, Kafka, Yarn, Oozie, and Zookeeper.
Worked on major Hadoop distributions like Cloudera, MapR and Horton Works.
Experience in processing structured, Semi - Structured and Unstructured Data using different tools and frameworks in Hadoop Ecosystem.
Have good knowledge on NoSQL databases like HBase, Cassandra and MongoDB.
Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
Analysis on integrating Kibana with Elastic Search.
Experience in using different components of Spark like Spark Streaming to process real-time data as well as historical data.
Knowledge on Cloud technologies like AWS Cloud and Amazon Elastic Map Reduce (EMR).
Proficient in using OOPS Concepts and Java concepts such as Generics, Multi-threading and Collections necessary for writing Map Reduce Jobs.
Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g, SQL Server and MySQL.
Experience in Enterprise Service Bus(ESB) such as WebSphere Message Broker.
Experience with code development frameworks - GitHub, BitBucket.
Expertise with Application servers and web servers like WebLogic, IBM WebSphere, Apache Tomcat, and VMware.
Deployed and monitored scalable infrastructure on cloud environment Amazon web services (AWS)
Comprehensive knowledge of Software Development Life Cycle (SDLC), having thorough understanding of various phases like Requirement, Analysis, Design, Development and Testing.
Involved in the Software Development methodologies like Agile and Waterfall estimating the timelines for projects.
Ability to quickly master new concepts and applications.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop (HDFS & MapReduce), PIG, HIVE, HBASE, ZOOKEEPER, Sqoop, Flume, Kafka, Spark, Spark Streaming, Spark SQL and Data Frames, Graph X, Scala, Elastic Search and AWS

Programming & Scripting Languages: Java, C, SQL, Python, Impala, Scala, C++, ESQL, PHP

J2EE Technologies: JSP, SERVLETS, EJB, Angular JS

Web Technologies: HTML, JavaScript

Frameworks: Spring 3.5 - Spring MVC, Spring ORM, Spring Security, Spring ROO, Hibernate, Struts

Application Servers: IBM Web Sphere, JBoss, WebLogic

Web Servers: Apache Tomcat

Databases: MS SQL Server & SQL Server Integration Services (SSIS), My SQL, MongoDB, Cassandra, Oracle DB, Teradata

IDEs: Eclipse, Net Beans

Operating System: Unix, Windows, Ubuntu, Cent OS

Others: Putty, WinSCP, DataLake, Talend, Tableau, GitHub.

PROFESSIONAL EXPERIENCE:

Hadoop/Spark Developer

Confidential, Cumming, GA

Responsibilities:

Importing and Exporting huge chunks of data between HDFS and RDBMS by making use of Sqoop.
Performing Extract, Transform and Load (ETL) processes using Hive
Importing data stored in Amazon Web Services into HDFS
Providing solutions to ad hoc client requests for data and experienced in creating ad hoc reports.
Creating Hive Tables, loading with data and writing Hive queries.
Implemented dynamic Partitions and Bucketing in Hive for efficient data access.
Implementing several workflows using Apache Oozie framework to automate day-to-day Sqoop tasks.
Writing Hive jobs on processed data to parse and structure logs, manage and query data using HiveQL to facilitate effective querying.
Used Zookeeper to co-ordinate and run different cluster services.
Making use of Apache Impala wherever possible in place of Hive while analyzing data to achieve faster results.
Involved in converting Hive queries into spark transformations using Spark RDDs in Scala.
Experienced with Spark Context, Spark-SQL, Data Frame, RDDs’ and YARN.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Developed and Configured Kafka brokers to pipeline server logs data into spark streaming for real time processing.
Analyzing large datasets to find patterns and insights within structured and unstructured data to help business intelligence with the help of Tableau.
Coordinating with teams to resolve any errors and problems that arise technically as well as functionally.

Environment: Hadoop, Cloudera, Pig, Hive, Sqoop, Flume, Kafka, Spark, Storm, Tableau, HBase, Scala, Kerberos, Agile, Zookeeper, Maven, AWS, MySQL.

Hadoop Developer

Confidential, Frisco, TX

Responsibilities:

Design, implementation and deployment of Hadoop cluster.
Providing solutions based on issues using big data analytics.
Part of the team that built scalable distributed data solutions using Hadoop cluster environment using Horton Works distribution.
Loading data into the Hadoop distributed file system (HDFS) with the help of Kafka and REST API
Worked on Sqoop to load data into HDFS from Relational Database Management Systems.
Implementation of Talend jobs to load and integrate data from excel sheets using Kafka.
Developed custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes of data as per the requirement.
Worked on ORC, Avro file formats and some compression techniques like LZO.
Used Hive on top of structured data to implement dynamic partitions and bucketing of Hive tables.
Carried out transforming huge data of Structured, Semi-Structured and Unstructured types and analyzing them using Hive queries and Pig scripts.
Experienced in using Hadoop YARN as execution engine for data analytics using Hive.
Worked with MongoDB for developing and implementing programs in Hadoop Environment.
Based on necessity, used job management scheduler Apache Oozie to execute the workflow.
Implemented Ambari to keep track of node status, job progression and running analytical jobs in Hadoop clusters
Expertise in Tableau to build customized graphical reports, charts and worksheets.
Filter the dataset with PIG UDF, PIG scripts in HDFS and Storm/Bolt in Apache Storm.

Environment: Hadoop, Pig, Hive, HBase, Sqoop, Python, Oozie, Zookeeper, RHEL, Java, Eclipse, SQL, NoSQL, Talend, Tableau, MongoDB.

Hadoop Developer

Confidential, Portland, ME

Responsibilities:

Used Hadoop Cloudera Distribution. Involved in all phases of the Big Data Implementation including requirement analysis, design and development of Hadoop cluster.
Collecting and aggregating huge sets of data using Apache Flume, staging data in Hadoop storage system HDFS for further analyzation.
Design, build and support pipelines of data ingestion, transformation, conversion and validation.
Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
Performing different types joins on hive tables along with experience in partitioning, bucketing and collection concepts in Hive for efficient data access.
Managing data between different databases like ingesting data into Cassandra and consuming the ingested data to Hadoop.
Creating Hive external tables to perform Extract, Transform and Load (ETL) operations on data that is generated on a daily basis.
Creating HBase tables for random queries as requested by the business intelligence and other teams.
Created final tables in Parquet format. Use of Impala to create and manage Parquet tables.
Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
Worked on NoSQL databases including HBase and Cassandra.
Participated in development/implementation of Cloudera impala Hadoop environment.
Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
Developed the data model to manage the summarized data.

Environment: Hadoop, Cloudera Hive, Java, Python, Parquet, Oozie, Cassandra, Zookeeper, HiveQl/SQL, MongoDB, Tableau, Impala.

Network Engineer

Confidential

Responsibilities:

Establishing networking environment by designing system configuration and directing system installation.
Enforcing system standards and defining protocols.
Maximizing network performance by monitoring and troubleshooting network problems and outages.
Setting up policies for data security and network optimization
Maintaining the clusters needed for data processing especially for big data where a bunch of servers are setup on a complex yet efficient network.
Reporting network operational status by gathering, filtering and prioritizing information necessary for the optimal network upkeep.
Keeping the budget low by efficiently making use of the available resources and tracking the data transfer speeds and processing speeds
Tracking Budget Expenses, Project Management, Problem Solving, LAN Knowledge, Proxy Servers, Networking Knowledge, Network Design and Implementation, Network Troubleshooting, Network Hardware Configuration, Network Performance Tuning, People Management

Environment: Wireshark, GNS3, Hadoop, IP addressing, VPN, VLAN, Network Protocols.

Jr. Java Developer

Confidential

Responsibilities:

Analyzing requirements and specifications in Agile based environment.
Development of web interface for User module and Admin module using JSP, HTML, XML, CSS, JavaScript, AJAX, and Action Servlets with Struts Framework.
Extensively worked on CORE JAVA (Collections of Generics and Templates, Interfaces for passing the data from GUI Layer to Business Layer).
Analyzation and design of the system based on OOAD principle.
Used WebSphere Application Server to deploy the build.
Development, Testing and Debugging of the developed application in Eclipse.
Used DOM Parser to parse the XML files.
Log4j framework has been used for logging debug, info & error data.
Used Oracle 10g Database for data persistence.
Transferring of files from local system to other systems is done using WinSCP.
Performed Test Driven Development (TDD) using JUnit.

Environment: J2EE, HTML, JSON, JavaScript, CSS, Struts, Spring, Hibernate, Eclipse, Oracle 10g, SQL, XML, CVS, HTML.

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Cumming, GA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship