Hadoop Developer Resume Ashburn - Hire IT People

PROFESSIONAL SUMMARY:

Over 7+ years of work experience in IT field, involved in all phases of software development lifecycle while working in different projects
Very strong experience in processing, analyzing large sets of structured, semi - structured and unstructured data and supporting systems application architecture
Extensive experience in writing Hadoop jobs for data analysis as per the business requirements using Hive and Pig
Expertise in creating Hive Internal/External Tables/Views using shared Meta store
Developed custom UDFs in Pig and Hive to extend their core functionality
Hands on experience in transferring incoming data from various application servers into HDFS, Hive, HBase using Apache Flume
Stored Data in Vertica EDW
Have experience of working on Snow - flake and Vertica data warehouse.
Worked extensively on SQOOP to import and export data from RDBMS to HDFS and vice-versa
Performed Data Ingestion from multiple disparate sources and systems using Kafka
Proficient in big data ingestion and streaming tools like Apache Flume, Sqoop, Kafka, Storm and Spark.
Experience of working on data formats like Avro, Parquet
Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement
Good experience of AWS Elastic Block Storage (EBS), different volume types and use of various types of EBS volumes based on requirement
Experience creating real-time data streaming solutions using Apache Spark core, Spark SQL, Kafka, spark streaming and Apache Storm.
Worked on Oozie to manage and schedule the jobs on Hadoop cluster
Implemented AWS provides a variety of computing and networking services to meet the needs of applications
Knowledge of developing analytical components using Scala
Experience in managing and reviewing Hadoop log files
Worked with NoSQL database HBase to create tables and store data
Experience in setting up Hive, Pig, HBase, and SQOOP on Ubuntu Operating system
Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle 10g, MySQL, MS SQL Server & PL/SQL
Proficient in using data visualization tools like Tableau, Raw and MS Excel
Developed applications using Java, RDBMS and UNIX Shell scripting
Experience of working on Servlets, JSP, JSF, Spring, Hibernate, JPA and JDBC
Experience in developing web interfaces using technologies like XML, HTML, DHTML and CSS
Implemented functions, stored procedures, triggers using PL/SQL
Good understanding of ETL processes and Data warehousing
Strong experience in writing UNIX shell scripts
Working in different projects provided exposure and good understanding of different phases in SDLC

TECHNICAL SKILLS:

Hadoop/Big Data: Hadoop 1x/2x(Yarn), HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Kafka, Spark, Storm, Zookeeper, Scala, Oozie, Ambari, Tez, R

Development Tools: Eclipse, IBM DB2 Command Editor, TOAD, SQL Developer, VM Ware

Programming/Scripting Languages: Java, C++, Unix Shell Scripting, Python, SQL, Pig Latin, Hive QL

Databases: Oracle 11g,10g,9i, MySQL, SQL Server 2005,2008, PostgreSQL& DB2

NoSQL Databases: HBase, Cassandra, Mongo DB

ETL: Informatics

Visualization: Tableau, Raw and MS Excel

Frameworks: Hibernate, JSF 2.0, Spring

Version Control Tools: Sub Version (SVN), Concurrent Versions System (CVS) and IBM Rational Clear Case

Methodologies: Agile/ Scrum, Waterfall

Operating Systems: Windows, Unix, Linux and Solaris

PROFESSIONAL EXPERIENCE:

Confidential, Ashburn

Hadoop Developer

Responsibilities:

Using Sqoop to import and export data from Oracle and DB2 into HDFS so as to use it for the analysis
Migrated Existing MapReduce programs to Spark Models using Python.
Migrating the data from Data Lake (hive) into s3 Bucket.
Done data validation between data present in data lake and s3 bucket.
Used Spark Data Frame API over Cloudera platform to perform analytics on hive data.
Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
Have experience of working on Snow - flake data warehouse.
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
Designed custom Spark REPL application to handle similar datasets
Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation
Performed Hive test queries on local sample files and HDFS files
Used Kafka for real time data ingestion.
Created different topic for reading the data in Kafka
Read data from different topics in Kafka.
Moved data from s3 bucket to snowflake data warehouse for generating the reports.
Written Hive queries for data analysis to meet the business requirements
Migrated an existing on-premises application to AWS.
Created Hive tables and worked on them using Hive QL
Assisted in loading large sets of data (Structure, Semi Structured, and Unstructured) to HDFS
Extensively used Pig for data cleaning and optimization
Developed Spark SQL to load tables into HDFS to run select queries on top.
Used AWS services like EC2 and S3 for small data sets.
Developed the application on IntelliJ IDE
Create data Frames using Scala.
Developed Hive queries to analyze data and generate results
Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
Used Scala to write code for all Spark use cases.
Analyzed user request patterns and implemented various performance optimization measures including implementing partitions and buckets in HiveQL
Assigned name to each of the columns using case class option in Scala.
Developed multiple Spark Sql jobs for data cleaning
Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
Created many Spark UDF and UDAFs in Hive for functions that were not preexisting in Hive and Spark Sql.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Implementing different performance optimization techniques such as using distributed cache for small datasets, partitioning and bucketing in hive, doing map side joins etc.
Good knowledge on Spark platform parameters like memory, cores and executors
By using Zookeeper implementation in the cluster, provided concurrent access for hive tables with shared and exclusive locking
Developed analytical component using Scala, Spark and Spark Stream.
Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
Worked on the NoSQL databases HBase and mongo DB.

Environment: Linux, Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, HBASE, AWS (S3, EMR), Scala, Spark, SQOOP

Confidential, Plano

Spark Developer

Responsibilities:

Experienced in development using Cloudera distribution system.
As a Hadoop Developer, my responsibility is manage the data pipelines and data lake.
Performing Hadoop ETL using hive on data at different stages of pipeline.
Worked in an agile technology with Scrum
Sqooped data from different source systems and automating them with oozie workflows.
Generation of business reports from Data Lake using Hadoop SQL (Impala) as per the Business Needs.
Automation of Business reports using Bash scripts in UNIX on Data Lake by sending them to business owners.
Developed Spark scala code to cleanse and perform ETL on the data in data pipeline in different stages.
Worked in different environments like DEV, QA, Data Lake and Analytics Cluster as part of Hadoop Development.
Snapped the cleansed data to the Analytics Cluster for reporting purpose to Business.
Developed pig scripts, python to perform Streaming and created tables on the top of it using hive.
Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, and SQL
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
Developed Oozie workflow engine to run multiple Hive, Pig, sqoop and Spark jobs.
Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
Developed pig, hive, sqoop, Hadoop streaming, spark actions in Oozie in the workflow management.
Supported Map Reduce Programs those are running on the cluster.
Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume and Kafka.
Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
Good Understanding of Workflow management process and in implementation.

Environment: Hadoop, AWS, Java, HDFS, MapReduce, Spark, Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, Oozie, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera

Confidential, Chicago

Hadoop Developer

Responsibilities:

Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
Developed MapReduce programs to parse the raw data and store the refined data in tables
Designed and modified database tables and used HBase queries to insert and fetch data from tables
Involved in moving all log files generated from various sources to HDFS for further processing through Flume
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS that were further used for analysis
Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications
Created Hive tables, loaded data and wrote Hive queries that run within the map
Used Oozie operational services for batch processing and scheduling workflows dynamically
Developed and updated social media analytics dashboards on regular basis
Performed data mining investigations to find new insights related to customers
Managed and read viewed Hadoop log files
Used Vertica as Enterprise data warehouse.
Analyzed the web log data using the HiveQL to extract number of unique visitors per day
Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages
Involved in identification of topics and trends and building context around that brand
Involved in the identifying, analyzing defects, questionable function error and inconsistencies observed in the output

Environment: HBase, Hadoop, HDFS, Map Reduce, Hive, Sqoop, Flume 1.3, Oozie, Zookeeper, MySQL, and Eclipse

Confidential

Java Developer

Responsibilities:

Used XML for ORM mapping relations with the java classes and the database
Worked in Analysis, Design and Coding for client development using J2EE stack using Eclipse platform
Involved in creating web-based java components like client Applets and client side UI using JFC in Eclipse
Developed PL/SQL stored procedures to perform complex database operations
Used Struts in presentation tier
Used Subversion as the version control system
Played key role in the design and development of application using J2EE, Struts, Spring
Involved in various phases of Software Development Life Cycle
Configured Struts framework to implement MVC design patterns
Designed and developed GUI using JSP, HTML, DHTML and CSS
Generated the Hibernate XML and Java Mappings for the schemas
Used Rational Application Developer (RAD) as Integrated Development Environment (IDE)
Extensively used Core Java, Servlets, JSP and XML
Used Oracle WebLogic workshop to generate the web service artifacts from the given WSDL for JAX-WS specification

Environment: Java, Struts, Servlets, spring, Tomcat, Hibernate, HTML, JSP, XML, SQL, J2EE, Junit, Oracle 11g, Windows

Confidential

Java Developer

Responsibilities:

Implemented server side programs by using Servlets and JSP
Designed, developed and validated User Interface using HTML, Java Script, XML and CSS
Implemented MVC using Struts Framework
Implemented Controller Servlet to handle the access to database
Participated in code walkthroughs, Debugging and defect fixing
Involved in the co-ordination of end to end production release process
Used SVN for versioning control system
Used JDBC prepared statements to call from Servlets for database access
Designed and documented the stored procedures
Involved in writing JUnit Test Cases and done unit testing for various components
Worked on database interaction layer for insertions, updating and retrieval operations of data from oracle database by writing stored procedures
Used Spring Framework for Dependency Injection and integrated with Hibernate
Used Log4J for any errors in the application

Environment: Java, J2EE, JSP, Servlets, HTML, DHTML, XML, JavaScript, Struts, Eclipse, WebLogic, PL/SQL and Oracle

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

AshburN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship