We provide IT Staff Augmentation Services!

Senior Hadoop Engineer Resume

0/5 (Submit Your Rating)

FloridA

SUMMARY

  • Over all IT experience 8+ years. 4+ years experience on Hadoop ecosystem.
  • Experience with complete software development life cycle (SDLC) and Software Engineering including Requirement gathering, Analyzing, Designing, Implementing, Testing, Support and Maintenance
  • Strong in Developing MapReduce Applications, Configuring the Development Environment, Tuning Jobs and Creating MapReduce Workflows
  • Experience in performing data enrichment, cleansing, analytics, aggregations using Hive and Pig
  • Experience in importing and exporting data from different relational databases like MySQL, Netezza, Oracle into HDFS and Hive using Sqoop
  • Extensiveexperienceof handling and converting various data interchange formats (XML,JSON, AVRO,PARQUET) in distributed framework.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Implemented on Hadoop stack and different big data analytic tools, migration from different databases SQL Server2008 R2, Oracle, MYSQL to Hadoop.
  • Strong in designingspecifications, functional and technical requirements and process flows
  • Experience in dealing with distributed systems, large - scale non-relational data stores, data modeling and multi-terabyte data warehouses
  • Hands on experience in application development and database management using the technologies JAVA, RDBMS, Linux/Unix shell scripting and Linux internals
  • Experience in deploying applications in heterogeneous Application Servers TOMCAT, WebLogic, IBM WebSphere and Oracle Application Server.
  • Experience in Designing, Installing, Configuring and Administrating Hadoop Cluster of major Hadoop distributions - Cloudera, Hortonworks&Apache hadoop
  • Experience in designing, building and implementing complete Hadoop ecosystem comprising of MapReduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, Spark
  • Worked on Multi Clustered environment, setting up Production and QA Hadoop Cluster, Benchmarking the Hadoop Cluster on Amazon AWS, Rackspace and EC2 Cloud environments
  • Good handson experience in writing shell scripts in Linux/Unix
  • Experience in developing use cases, activity diagrams, sequence diagrams and class diagrams using UML Rational Ross and MS Visio
  • Extensively implemented various types of ETL/EDW Migrationprojects using MapReduce, Pig, Hive, Sqoop
  • Experience with Apache Storm and streaming real-time CEP solutions.
  • Understanding of Cloud architectures and computationally intensive development environments.
  • Experience with Docker and Zookeeper, HDFS/Hadoop and cluster management.
  • Expertise with Git, Maven and the Agile development process.
  • Experience in using Oozie, ControlM and Autosys workflow engine for managing and scheduling Hadoop Jobs
  • Worked with big data teams to offload ETL jobs from Teradata and Netezzato Hadoop.
  • Experience in importing streaming logs and aggregating the data to HDFS using Flume
  • Built ingestion framework using Kafka for streaming logs and aggregating the data into HDFS using Camus
  • Knowledge in stream processing technologies like Apache Spark and Storm
  • Experience in setting up monitoring tools like Ganglia and Nagiosfor Hadoop and HBase
  • Involved in Analysis, Design, Coding and Development of Java custom Interfaces
  • Hands on experience on SDLC under agile environment
  • Exceptional ability to quickly master new concepts and technologies.
  • Proficient in communicating with people at all levels of hierarchy in the organization, very good at post implementation support and a team player with strong analytical & problem solving skills

TECHNICAL SKILLS

Hadoop/ Big Data: HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Flume, Oozie, Cassandra, HBase, Zookeeper, Spark, Impala, Kafka, storm

Languages/Simulators: C, C++, Java, Python, SQL, UNIX Shell Scripting, Scala

Operating Systems: Windows Variants, Mac, UNIX, LINUX

Database: MySQL, Oracle

IDE Tools: Eclipse, Net Beans, SQL Developer, MS Visual Studio

Version Control: Git, Svn

Software Tools: MS Office Suite(Word, Excel, Project), MS Visio

Web Technologies: HTML, CSS, XML, PHP

Monitoring Tools: Ganglia, Nagios, Cloudera Manager

NoSQL Databases: Cassandra, HBase

PROFESSIONAL EXPERIENCE

Senior Hadoop Engineer

Confidential, Florida

Responsibilities:

  • Developed Simple to complex Map/reduce Jobs using Hive and Pig for performing analytics on data.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Involved in hadoopcluster task like commissioning & decommissioning Nodes without any effect for running jobs on the data.
  • Wrote Map Reduce jobs to discover trends in data usage by users.
  • Involved in running hadoop streaming jobs which helps to process terabytes of text data.
  • Introduced Oozie workflow to develop job processing scripts.
  • Analyzed large data sets with the help of running Hive queries and Pig scripts.
  • Customized parser loader application of Data migration to HBase.
  • Loaded cache data into HBase using Sqoop for performing the various queries on the data.
  • Created lots of external tables on Hive pointed to HBase tables.
  • Analyzed HBase data in Hive by creating external partitioned and bucketed tables so that efficiency is maintained.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Performed all Linux operating system, disk management and patch management configurations, on Linux instances in AWS.
  • Provided Low-latency computations by caching the working dataset in memory and then performing computations at memory speeds using Spark.
  • Implemented in loading and transforming of large data sets of different types of data formats like structured, semi structured and unstructured data.
  • For performing easily combine batch, interactive, and streaming jobs in the same application is done using Spark.
  • Analyzed large data sets with the help of running Hive queries and Pig scripts.
  • Customized parser loader application of Data migration to HBase.
  • Loaded cache data into HBase using Sqoop for performing the various queries on the data.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Wrote Hive Queries and UDF's.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.

Environment: Hadoop,HDFS,MapReduce,Hive, Scala, Pig, Oozie, HBase, Spark, Kafka, Shell Scripting, MySQL, DB2, Oracle

Confidential, San Rafael, CA

Senior hadoop consultant

Responsibilities:

  • Developed custom MapReduce programs in java to perform daily transformation of JSON data to text format and store them in HDFS with respect to business requirement
  • Designedtheimplemented Hive Tables, Partition strategy, Hcatalog usage and performance tuning of Hive queries
  • Developed Pig scripts and UDF’s for data cleansing and denormalization of multiple datasets
  • Designed and implemented ETL workflow which includes data ingestion from different databases/Dataware house into HDFS using Sqoop, Transformation and Analysis in Hive/Pig, Preprocessing the raw data using Map reduce
  • Wrote Sqooppipeline to efficiently Transfer data from MySQL, DB2, Oracle Exadata, Netezza to Hadoop Environment.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
  • Used HDFS API to pull data from HDFS and then SOLR API to push data into solr for indexing
  • Assisted Big data infra team on building Hadoop CDH clusters for development and production environment
  • Strong working knowledge on spark streaming, RDD's, Spark SQL and Scala
  • Processing data through Spark by using Scala and Spark sql
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data
  • Worked on Collecting and aggregating large amounts of log data using Apache Flume and staging data using HDFS for further analysis
  • DevelopedPOC for Kafka Rest API to collect events from front end
  • Worked with HadoopBusiness team to determine the use case discovery, Technical specifications and Documentation
  • Worked with different file formats and compression techniques in Hadoop like Avro, Sequence, Lzo and Snappy
  • Worked on historical data to eliminate the occurrences of duplicatedata in HDFS using hive
  • Worked with big data Analyst’s & Data science team in troubleshooting Map reduce job failures and issues with Hive, Pig
  • Copied the data from one cluster to other cluster by using DISTCP and automated the procedure using shell scripts
  • Automated shell scripts, MapReduce programs, hive jobs and created workflow using Ooziescheduler.

Environment: Hadoop,HDFS,MapReduce,Hive, Pig, Oozie, HBase, Spark, Kafka, Shell Scripting, Java,MySQL, DB2, Oracle, Scala

Confidential, Austin, Texas

Big data consultant

Responsibilities:

  • Developed theMapReduce jobs in java and for data cleansing and preprocessing
  • Moved data from MS Sql Server &Oracle to HDFS and vice-versa using SQOOP
  • Worked on collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Worked with different file formats and compression techniques in Hadoop to determine standards
  • Developed data pipeline using Pig & Hive from EDW source. These pipelines had customized UDF’sto extend the ETL functionality
  • Developed hive queries and UDF’s to analyze/transform the data in HDFS
  • Developed hive scripts for implementing control table logic in HDFS
  • Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE
  • Developed Pig scripts and UDF’s as per the Business logic
  • Analyzing/Transforming data in HDFS with Hive and Pig
  • Developed Oozie workflows and scheduled through a scheduler on a monthly basis.
  • Involved with Big data team in End to End implementation of ETL logic.
  • Effective coordination with offshore team and managed project deliverable on time.
  • Worked on QA support activities like test data creation and Unit testing activities.

Environment: Hadoop, MapReduce MRv1, Hive,Oozie, Pig, Sqoop,Java, Eclipse IDE, Shell Scripting, MS Sql Server, Oracle

Confidential, San Mateo, CA

Hadoop Consultant

Responsibilities:

  • Managed and analyzed Hadoop Log Files
  • Managed jobs using Fair Scheduler
  • Configured Hive Metastore to use MySQL database to establish multiple user connections to hive tables
  • Imported data into HDFS using Sqoop
  • Experience in retrieving data from databases like MySQL and Oracle into HDFS using Sqoop and ingesting them into HBase
  • Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns
  • Worked on shell scripting to automate jobs.
  • Used PigLatin to analyze datasets and perform transformation according to business requirements
  • Configured Nagios for receiving alerts on critical failures in the cluster by integrating with custom Shell Scripts
  • Configured the Ganglia monitoring tool to monitor both Hadoop and system specific metrics
  • Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
  • Implemented MapReduce programs to perform joins using secondary sorting and distributed cache
  • Generated daily and weekly Status Reports to the team manager and participated in weekly status meeting with Team members, Business analysts and Development team

Environment: Apache Hadoop 0.20.203, MapReduce, Hive, Apache Maven, Java, Eclipse IDE, Sqoop, Ganglia, Nagios, Shell Script, Pig, Flume, Maven.

Confidential

System Analyst

Responsibilities:

  • Key responsibilities includerequirements gathering, designing and developing the Java applications
  • Implemented design patterns and Object Oriented Java design concepts to build the code
  • Participated in planning and development of UML diagrams like Use Case Diagrams, Object Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phase
  • Identified and fixed transactional issues due to incorrect exceptional handling and concurrency issues due to unsynchronized block of code
  • Created Java application module for providing authentication to the users for using this application and to synchronize handset with the Exchange server
  • Performed unit testing, system testing and user acceptance test
  • Involved in Analysis, Design, Coding and Development of custom Interfaces
  • Gathered requirements from the client for designing the Web Pages
  • Gathered specifications for the Library site from different departments and users of the services
  • Assisted in proposing suitable UML class diagrams for the project
  • Wrote SQL scripts to create and maintain the database, roles, users, tables, views, procedures and triggers
  • Designed and implemented the UI using HTML and Java
  • Strong knowledge on MVC design pattern
  • Worked on database interaction layer for insertions, updating and retrieval operations on data
  • Implemented Multi-threading functionality using Java Threading API

Environment: Java,JDBC, HTML, SQL, Oracle, IBM Rational Rose, Eclipse IDE, LDAP

We'd love your feedback!