Hadoop Developer Resume
Durham-nC
SUMMARY
- 5 years of IT experience in software Development and Big Data Technologies and Analytical Solutions with 3 years of hands - on experience in development and design of Java and Scala.
- 4 years of experience as Hadoop Developer with good knowledge of Hadoop framework, Hadoop Distributed file system and Parallel processing implementation, Hadoop Ecosystems HDFS, Map Reduce, Hive, Pig, Python, HBase, Sqoop, Hue, Oozie, Impala, Spark.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experienced in handling different file formats like Text file, Avro data files, Sequence files, Xml, Json files and parquet files.
- Extensively worked on Spark Core, Numeric and Pair RDD's, Data Frames for developing Spark applications
- Expertise in deployment of Hadoop, Yarn, Spark integration with HBase, etc.
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
- Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop MapReduce, Hive, Spark jobs.
- Experienced in implementing Kerberos authentication protocol in Hadoop for data security.
- Experienced in code versioning and dependency management systems such as Git and Maven.
- Experienced with Testing Map Reduce programs using Maven.
- Adequate knowledge and working experience in Agile & Waterfall methodologies.
- Great team player and quick learner with effective communication, motivation and organizational skills combined with attention to details and business improvements.
TECHNICAL SKILLS
Hadoop ECO Systems: Hadoop, Spark, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Zookeeper, Flume, Impala, Hue, Oozie
NOSQL/Data Bases: HBase, Cassandra, MongoDB
Languages: Scala, Java, C/C++, SQL, Teradata SQL, PL/SQL.
Operating Systems: Windows XP/Vista, Mac OS, UNIX, LINUX
IDE’s & Utilities: IntelliJ, Eclipse, NetBeans.
SQL Server Tools: SQL Server Management Studio, SSIS ETL
Web Technologies: JavaScript, HTML, CSS, XML
Cloud technologies: AWS S3, EC2, EMR
Business Intelligence Tools: Tableau, Pentaho
ETL Tools: Informatica
Methodologies: Agile, UML, Design Patterns
PROFESSIONAL EXPERIENCE
Confidential, Durham-NC
Hadoop Developer
Responsibilities:
- Working on Spark/java programming in building an application from scratch
- Loading and accessing data from AWS S3 for running spark jobs on AWS EMR
- Using Maven for building .jar files for further running spark jobs
- Creating and maintaining cluster on AWS EMR
- Creating topics on Kafka server and consuming for the spark jobs
- Using Sqoop for structured data transfer from RDBMS to HDFS
- Working on csv, JSON and parquet file formats, wrote application in spark/java for converting different file formats to parquet
- Responsible for utilizing the resources efficiently by calculating and allocating the data across the cluster
- Working on delta detection for updating customer’s information in the master database
Environment: AWS EMR, HDFS2.7.2, AWS S3, Spark SQL 2.1.1, Spark 2.1.1, Sqoop 1.4.6, Scala 2.12, Shell Scripting, Java, GitHub, JSON, CSV, Parquet
Confidential, McLean-VA
Hadoop Developer
Responsibilities:
- Worked on spark/Scala programming to create UDFs
- Created and accessed AWS S3 buckets
- Connected to AWS EC2 using SSH and ran spark-submit jobs
- Worked on cloud era environment
- Analyzed existing code and made the bug fixes wherever required
- Ran many test cases in Scala
- Used java in removing an attribute in JSON file where Scala was not supporting to create objects and again converted to Scala
- Worked on master clean-up of data
- Worked on collections framework in java
- Worked in intellij IDE for the development and debugging
- Wrote a whole set of programs for one of the LOB’s in Scala and made unit testing
- Created many SQL schemas and utilized them throughout the program wherever required
- Made enhancements to one of the LOBs using Scala programming
- Ran spark-submit job and analyzed the log files
- Used Maven to build .jar files
- Used Sqoop to transfer data between relational databases and Hadoop
- Worked on HDFS to store and access huge datasets within Hadoop
- Good hands on experience with git and GitHub
- Created a feature node on GitHub
- Pushed the data GitHub and made a pull request
- Experience in JSON and CFF
Environment: Cloudera5.8, Hadoop2.7.2, HDFS2.7.2, AWS s3, AWS EC2, SparkSql1.6.1, Sqoop1.4.6, Spark1.6.3, Scala 2.12, MySQL, Shell Scripting, Java, GitHub, JSON, CFF
Confidential, NC
Hadoop Developer
Responsibilities:
- Transferred purchase transaction details from legacy systems to HDFS.
- Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
- Developed PIG UDF'S for manipulating the data as per the business requirements and worked on developing custom PIG Loaders.
- Collected and aggregated large amounts of weblog data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Worked on the Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer)
- Experience in monitoring and managing Cassandra cluster.
- Analyzed the weblog data using the HiveQL, integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
- Installed and configured Flume, Hive, Pig, SqoopandOozie on the Hadoop cluster
- Wrote the MapReduce jobs to parse the weblogs which are stored in HDFS
- Developed the services to run the MapReduce jobs as per the requirement basis.
- Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP
Environment: Hadoop, HDFS, pig, Hive, Tez, Accumulo, Flume, Sqoop, Oozie, Cassandra.
Confidential, Cuyahoga Falls-OH
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
- Involved in loading data from LINUX file system, servers, Java web services using KafkaProducers, partitions.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Implemented Storm topologies to pre-process data before moving into HDFS system.
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
- Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
- Migrated complex MapReduce programs into Spark RDD transformations, actions.
- Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
- Involved in converting Hive/SQL queries into Spark transformations using Sparkdataframes, Scala
- Expertise in implementing Spark/Scala application using higher order functions for both batch and interactive analysis requirement.
- Implemented Hive complex UDF's to execute business logic with Hive Queries.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
- Indexed documents using Apache Solr.
- Worked on solr configuration and customizations based on requirements.
- Implemented Spark using Scala and utilizing Data frames and SparkSQLAPI for faster processing of data.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Responsible for developing data pipeline by implementing Kafka producers and consumers and configuring brokers.
Environment: Cloudera5.8, Hadoop2.7.2, HDFS2.7.2, AWS, Hive2.0, Impala, SparkSql1.6.1, MapReduce1.x, Flume1.7.0, Sqoop1.4.6, Oozie 4.1, Kafka 0.10, Spark1.6.3, Scala 2.12, Hbase0.98.19, ZooKeeper3.4.9, MySQL, Shell Scripting, Java.
Confidential
Java developer/Hadoop Developer
Responsibilities:
- Developed custom data Ingestion adapters to extract the log data and click stream data from external systems and load into HDFS.
- Used Spark as ETL tool to do complex Transformations, De-Normalization, Enrichment and some pre-aggregations.
- Creating Hive tables, loading data and writing hive queries for building Analytical Datasets.
- Developed a working prototype for real time data ingestion and processing using Kafka, Spark Streaming, and HBase.
- Developed Kafka producer and Spark Streaming consumer to read the stream of events as per business rules.
- Designed and developed Job flows using Oozie.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Used AVRO, Parquet File formats and Snappy compression through the project.
- The data is collected from distributed sources into Avro models. Applied transformations and standardizations and loaded into HBase for further data processing.
Environment: Cloudera CDH5.x, Pentaho, HDFS, Hadoop 2.2.0 (yarn), Eclipse, Hive, PIG Latin, Sqoop, Zookeeper, Apache Kafka, Apache Storm, MySQL
