Hadoop Developer Resume Durham-NC - Hire IT People

SUMMARY

5 years of IT experience in software Development and Big Data Technologies and Analytical Solutions with 3 years of hands - on experience in development and design of Java and Scala.
4 years of experience as Hadoop Developer with good knowledge of Hadoop framework, Hadoop Distributed file system and Parallel processing implementation, Hadoop Ecosystems HDFS, Map Reduce, Hive, Pig, Python, HBase, Sqoop, Hue, Oozie, Impala, Spark.
Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Experienced in handling different file formats like Text file, Avro data files, Sequence files, Xml, Json files and parquet files.
Extensively worked on Spark Core, Numeric and Pair RDD's, Data Frames for developing Spark applications
Expertise in deployment of Hadoop, Yarn, Spark integration with HBase, etc.
Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop MapReduce, Hive, Spark jobs.
Experienced in implementing Kerberos authentication protocol in Hadoop for data security.
Experienced in code versioning and dependency management systems such as Git and Maven.
Experienced with Testing Map Reduce programs using Maven.
Adequate knowledge and working experience in Agile & Waterfall methodologies.
Great team player and quick learner with effective communication, motivation and organizational skills combined with attention to details and business improvements.

TECHNICAL SKILLS

Hadoop ECO Systems: Hadoop, Spark, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Zookeeper, Flume, Impala, Hue, Oozie

NOSQL/Data Bases: HBase, Cassandra, MongoDB

Languages: Scala, Java, C/C++, SQL, Teradata SQL, PL/SQL.

Operating Systems: Windows XP/Vista, Mac OS, UNIX, LINUX

IDE’s & Utilities: IntelliJ, Eclipse, NetBeans.

SQL Server Tools: SQL Server Management Studio, SSIS ETL

Web Technologies: JavaScript, HTML, CSS, XML

Cloud technologies: AWS S3, EC2, EMR

Business Intelligence Tools: Tableau, Pentaho

ETL Tools: Informatica

Methodologies: Agile, UML, Design Patterns

PROFESSIONAL EXPERIENCE

Confidential, Durham-NC

Hadoop Developer

Responsibilities:

Working on Spark/java programming in building an application from scratch
Loading and accessing data from AWS S3 for running spark jobs on AWS EMR
Using Maven for building .jar files for further running spark jobs
Creating and maintaining cluster on AWS EMR
Creating topics on Kafka server and consuming for the spark jobs
Using Sqoop for structured data transfer from RDBMS to HDFS
Working on csv, JSON and parquet file formats, wrote application in spark/java for converting different file formats to parquet
Responsible for utilizing the resources efficiently by calculating and allocating the data across the cluster
Working on delta detection for updating customer’s information in the master database

Environment: AWS EMR, HDFS2.7.2, AWS S3, Spark SQL 2.1.1, Spark 2.1.1, Sqoop 1.4.6, Scala 2.12, Shell Scripting, Java, GitHub, JSON, CSV, Parquet

Confidential, McLean-VA

Hadoop Developer

Responsibilities:

Worked on spark/Scala programming to create UDFs
Created and accessed AWS S3 buckets
Connected to AWS EC2 using SSH and ran spark-submit jobs
Worked on cloud era environment
Analyzed existing code and made the bug fixes wherever required
Ran many test cases in Scala
Used java in removing an attribute in JSON file where Scala was not supporting to create objects and again converted to Scala
Worked on master clean-up of data
Worked on collections framework in java
Worked in intellij IDE for the development and debugging
Wrote a whole set of programs for one of the LOB’s in Scala and made unit testing
Created many SQL schemas and utilized them throughout the program wherever required
Made enhancements to one of the LOBs using Scala programming
Ran spark-submit job and analyzed the log files
Used Maven to build .jar files
Used Sqoop to transfer data between relational databases and Hadoop
Worked on HDFS to store and access huge datasets within Hadoop
Good hands on experience with git and GitHub
Created a feature node on GitHub
Pushed the data GitHub and made a pull request
Experience in JSON and CFF

Environment: Cloudera5.8, Hadoop2.7.2, HDFS2.7.2, AWS s3, AWS EC2, SparkSql1.6.1, Sqoop1.4.6, Spark1.6.3, Scala 2.12, MySQL, Shell Scripting, Java, GitHub, JSON, CFF

Confidential, NC

Hadoop Developer

Responsibilities:

Transferred purchase transaction details from legacy systems to HDFS.
Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
Developed PIG UDF'S for manipulating the data as per the business requirements and worked on developing custom PIG Loaders.
Collected and aggregated large amounts of weblog data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
Worked on the Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer)
Experience in monitoring and managing Cassandra cluster.
Analyzed the weblog data using the HiveQL, integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
Installed and configured Flume, Hive, Pig, SqoopandOozie on the Hadoop cluster
Wrote the MapReduce jobs to parse the weblogs which are stored in HDFS
Developed the services to run the MapReduce jobs as per the requirement basis.
Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP

Environment: Hadoop, HDFS, pig, Hive, Tez, Accumulo, Flume, Sqoop, Oozie, Cassandra.

Confidential, Cuyahoga Falls-OH

Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
Involved in loading data from LINUX file system, servers, Java web services using KafkaProducers, partitions.
Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
Implemented Storm topologies to pre-process data before moving into HDFS system.
Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
Migrated complex MapReduce programs into Spark RDD transformations, actions.
Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
Involved in converting Hive/SQL queries into Spark transformations using Sparkdataframes, Scala
Expertise in implementing Spark/Scala application using higher order functions for both batch and interactive analysis requirement.
Implemented Hive complex UDF's to execute business logic with Hive Queries.
Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
Indexed documents using Apache Solr.
Worked on solr configuration and customizations based on requirements.
Implemented Spark using Scala and utilizing Data frames and SparkSQLAPI for faster processing of data.
Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
Responsible for developing data pipeline by implementing Kafka producers and consumers and configuring brokers.

Environment: Cloudera5.8, Hadoop2.7.2, HDFS2.7.2, AWS, Hive2.0, Impala, SparkSql1.6.1, MapReduce1.x, Flume1.7.0, Sqoop1.4.6, Oozie 4.1, Kafka 0.10, Spark1.6.3, Scala 2.12, Hbase0.98.19, ZooKeeper3.4.9, MySQL, Shell Scripting, Java.

Confidential

Java developer/Hadoop Developer

Responsibilities:

Developed custom data Ingestion adapters to extract the log data and click stream data from external systems and load into HDFS.
Used Spark as ETL tool to do complex Transformations, De-Normalization, Enrichment and some pre-aggregations.
Creating Hive tables, loading data and writing hive queries for building Analytical Datasets.
Developed a working prototype for real time data ingestion and processing using Kafka, Spark Streaming, and HBase.
Developed Kafka producer and Spark Streaming consumer to read the stream of events as per business rules.
Designed and developed Job flows using Oozie.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Used AVRO, Parquet File formats and Snappy compression through the project.
The data is collected from distributed sources into Avro models. Applied transformations and standardizations and loaded into HBase for further data processing.

Environment: Cloudera CDH5.x, Pentaho, HDFS, Hadoop 2.2.0 (yarn), Eclipse, Hive, PIG Latin, Sqoop, Zookeeper, Apache Kafka, Apache Storm, MySQL

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Durham-nC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship