We provide IT Staff Augmentation Services!

Sr. Hadoop Engineer Resume

4.00/5 (Submit Your Rating)

Minneapolis, MN

SUMMARY:

  • IT Professional with 9+ years of hands on experience in BigData Ecosystem related technologies like Map Reduce, Hive, HBase, Pig, Scoop, Flume, Oozie and HDFS.
  • Skilled on common Big Data technologies such as Cassandra,Hadoop, HBase, MongoDB, Cassandra, and Impala.
  • Experience in developing & implementing MapReduce programs usingHadoopto work with Big Data requirement.
  • Hands on Experience in Big Data ingestion tools like Flume and Sqoop.
  • Experience in Cloudera distribution and Horton Works Distribution (HDP).
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Good knowledge on Apache Spark and Scala.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice - versa.
  • Extensive experience working in Teradata, Oracle, Netezza, SQL Server, DB2, and MySQL database.
  • Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
  • Experience in developing NoSQL database by using CRUD, Sharding, Indexing and Replication.
  • Expertise in using job scheduling and monitoring tool like Oozie.
  • Experience in layers ofHadoopFramework - Storage (HDFS), Analysis (Pig and Hive) &Engineering (Jobs and Workflows) for developing ETL processes to load data from multiple data sources to HDFS using Sqoop, Pig & Oozie for automating workflow.
  • Experience in developing and scheduling ETL workflows inHadoopusing Oozie with the help of deployment and managingHadoopcluster using Cloudera and Horton works.
  • In depth understanding ofHadoopArchitecture and various components such as HDFS, MR,HadoopGEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
  • Expertise in writing HIVE queries, Pig and Map Reduce scripts and loading the huge data from local file system and HDFS to Hive.
  • Experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka.
  • Good experience in working with cloud environment like Amazon Web Services EC2 and S3.
  • Ability to meet deadlines and handle multiple tasks, decisive with strong leadership qualities, flexible in work schedules and possess good communication skills.

TECHNICAL SKILLS:

Java/J2EE Technologies: JSP, Servlets, JQuery, JDBC, JavaScript

Hadoop/Big Data: HDFS, Hive, Pig, HBase, Map Reduce, Zookeeper, Spark, Scala, Akka, Kafka, Sqoop, Oozie, Flume, Storm

Programming Languages: Java, J2EE, HQL, R, Python, XPath, PL/SQL, Pig Latin

Operating Systems: UNIX, Linux, Windows

Web Technologies: HTML, XML, DHTML, XHTML, CSS, XSLT

Web/Application servers: Apache HTTP server, Apache Tomcat, JBoss

Frameworks: MVC, Struts, Spring, Hibernate

Databases: Microsoft Access, Mongo DB, Cassandra, MS SQL, Oracle, SQL Server

PROFESSIONAL EXPERIENCE:

Confidential, Minneapolis, MN

Sr. Hadoop Engineer

Responsibilities:

  • Installed, Configured and MaintainedHadoopclusters for application development andHadooptools like Hive, Pig, HBase, Zookeeper, and Sqoop.
  • Loaded the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements
  • Used Sqoop to import and export the data from Oracle & MySQL.
  • Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked on importing and exporting data into HDFS and Hive using Sqoop.
  • Used Flume to handle streaming data and loaded the data intoHadoop cluster.
  • Developed and executed hive queries for de-normalizing the data.
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analyses.
  • Responsible for executing hive queries using Hive Command Line, Web GUI HUE, and Impala to read, write and query the data into HBase.
  • Moved data from HDFS to Cassandra using MapReduce and Bulk Output Format class.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Worked on analyzing data with Hive and Pig.
  • Designed Apache Airflow entity resolution module for data ingestion into Microsoft SQL Server.
  • Created a new airflow DAG to find popular items in redshift and ingest in the main PostgreSQL DB via a web service call.
  • Developed batch processing pipeline to process data using python and airflow. Scheduled spark jobs using airflow.
  • Involved in writing, testing, and running MapReduce pipelines using Apache Crunch.
  • Managed, reviewedHadooplog file, and worked in analyzing SQL scripts and designed the solution for the process using Spark.
  • Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.

Environment: Hadoop, YARN, HBase, Teradata, D2, NoSQL, Kafka, Python, Zookeeper, Oozie, Tableau, Apache Crunch, Apache Storm, MySQL, SQL Server, jQuery, JavaScript, HTML, Ajax and CSS

Confidential - San Diego, CA

Sr. Hadoop Developer

Responsibilities:

  • Worked on a live 24 nodeHadoopcluster running on HDP 2.2.
  • Importing and exporting data jobs, to perform operations like copying data from RDBMS and to HDFS using Sqoop.
  • Worked with Sqoop jobs with incremental load to populate HAWQ External tables to internal table.
  • Created external and internal tables using HAWQ.
  • Worked with Spark core, Spark Streaming, and spark SQL modules of Spark.
  • Involved in various BigData application phases like data ingestion, data analytics and data visualization.
  • Worked on transferring data from RDBMS to HDFS and HIVE table using SQOOP.
  • Migrated the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
  • Used Flume to load the log data from multiple sources directly into HDFS.
  • Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters.
  • Assisted with performance tuning, monitoring, and troubleshooting.
  • Data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Manipulated the streaming data to clusters through Kafka and Spark- Streaming.
  • Optimized Hive QL/pig scripts by using execution engine like TEZ, Spark.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • ReviewingHadooplog files to delete failures.
  • Performed benchmarking of the NoSQL databases, Cassandra and HBASE streams.
  • Worked on workflow/schedulers like Oozie/Crontab/AutoSys.
  • Worked on partitions, bucketing concepts in Hive and designed both Managed and External tabled in Hive to optimize performance.
  • Created Hive tables and working on them for data analysis to meet the business requirements.
  • Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
  • Wrote PIG Scripts to Tokenized sensitive information using PROTEGRITY.
  • Used FLUME to dump the application server logs into HDFS.
  • Automated backups by shell for Linux to transfer data in S3 bucket.
  • Created test cases and uploaded into HP ALM.
  • Automated incremental loads to load data into production cluster.

Environment: Hadoop, MapReduce, AWS, HDFS, Hive, HBASE, Sqoop, Pig, Flume, Oracle, Teradata, PL/SQL, Java, Shell Scripting, HP ALM

Confidential - Santa Monica, CA

Hadoop Developer

Responsibilities:

  • Extensively worked on Hadooparchitecture and various components such as HDFS, application manager, node master, resource manager name node, data node and map reduce concepts.
  • Involved in moving all log files generated from various sources to HDFS for further processing through flume
  • Imported required tables from RDBMS to HDFS using Sqoop and used Storm and Kafka to get real time streaming of data into HBase.
  • Created Hive tables and loading with data writing hive queries that will run internally in a map reduce way.
  • Involved in moving all log files generated from various sources to HDFS for further process through flume.
  • Implemented the workflows using apache framework to automate tasks.
  • Wrote Map Reduce code that will take input as log files, then parse and structures them in in tabular format to facilitate effective querying on the log data.
  • Developed java code to generate, compare & merge AVRO schema files.
  • Developed complex map reduce streaming jobs using java language that are implemented using hive and pig.
  • Used hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Used hive optimization techniques during joins and best practices in writing hive scripts using HiveQL.
  • Importing and exporting data into HDFS and hive using Sqoop.
  • Wrote the HIVE queries to extract the data processed.
  • Developed data pipeline using flume, Sqoop, pig and map reduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Implemented Spark using Scala and utilizing Spark core, Spark streaming and Spark SQL API for faster processing of data instead of Map reduce in Java.
  • Used Spark-SQL to load JSON data and create schema RDD and loaded it into Hive tables handled structured data using Spark SQL.
  • Developed Pig Latin scripts to extract the data from the web server out files to load into HDFS.
  • Created HBase tables to store variable data formats of data coming from different legacy systems.
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.

Environment: Hadoop, HDFS, MapReduce, Hive, Python, PIG, Java, Oozie, HBASE, Sqoop, Flume, MySQL

Confidential - Kansas City, MO

Java Developer

Responsibilities:

  • Involved in Analysis, Design, Development, Integration and Testing of application modules and followed AGILE/SCRUM methodology. Participated in Estimation size of Backlog Items, Daily Scrum and Translation of backlog items into engineering design and logical units of work (tasks).
  • Used Spring framework for implementing IOC/JDBC/ORM, AOP and Spring Security to implement business layer.
  • Developed and Consumed Web services securely using JAX-WS API and tested using SOAP UI.
  • Extensively used Action, Dispatch Action, Action Forms, Struts Tag libraries, Struts Configuration from Struts.
  • Extensively used the Hibernate Query Language for data retrieval from the database and process the data in the business methods.
  • Developed pages using JSP, JSTL, Spring tags, jQuery, JavaScript & Used jQuery to make AJAX calls.
  • Used Jenkins continuous integration tool to do the deployments.
  • Worked on JDBC for database connections.
  • Worked on multithreaded middleware using socket programming to introduce whole set of new business rules implementing OOPS design and principles
  • Involved in implementing Java multithreading concepts.
  • Developed several REST web services supporting both XML and JSON to perform task such as demand response management.
  • Used Servlet, Java, and spring for server-side business logic.
  • Implemented the log functionality by using Log4j and internal logging API's.
  • Used Junit for server-side testing.
  • Used Maven build tools and SVN for version control.
  • Developed frontend of application using Bootstrap, Agular.JS and Node.JS frameworks.
  • Implemented SOA architecture using Enterprise Service Bus (ESB).
  • Designed front-end, data driven GUI using JSF, HTML4, JavaScript and CSS.
  • Used IBM MQ Series as the JMS provider.
  • Responsible for writing SQL Queries and Procedures using DB2.
  • Connection with Oracle, MySQL Database is implemented using Hibernate ORM. Configured hibernate, entities using annotations from scratch.

Environment: Core Java1.5, EJB, Hibernate 3.6, AWS, JSF, Struts, Spring 2.5, JPA, REST, JBoss, Selenium, Socket programming, DB2, Oracle 10g, XML, JUnit 4.0, XSLT, IDE, Angular.Js, NodeJS, HTML4, CSS, JavaScript, Apache Tomcat 5x, Log4j

Confidential

Java Developer

Responsibilities:

  • Effectively followed the best practices of Java/J2EE to minimize the unnecessary object creation.
  • Used JSP pages through Servlets Controller for client-side view.
  • Created jQuery, JavaScript plug-ins for UI.
  • Implement Restful web services with the Struts framework.
  • Verify them with the J Unit testing framework.
  • Working experience in using Oracle 10g backend Database.
  • Used JMS Queues to develop Internal Messaging System.
  • Developed the UML Use Cases, Activity, Sequence and Class diagrams using Rational Rose.
  • Developed Java, JDBC, and Java Beans using JBuilder IDE.
  • Developed JSP pages and Servlets for customer maintenance.
  • Used Apache Tomcat Server to deploy the application.
  • Involved in Building the modules in Linux environment with ant script.
  • Used Resource Manager to schedule the job in UNIX server.
  • Performed Unit testing, Integration testing for all the modules of the system.
  • Developed JAVA BEAN components utilizing AWT and SWING classes.

Environment: Java, JDK, Servlets, JSP, HTML, JBuilder, HTML, JavaScript, CSS, Tomcat, Apache HTTP Server, XML, JUNIT, EJB, RESTful, Oracle

We'd love your feedback!