We provide IT Staff Augmentation Services!

Big Data/hadoop Engineer Resume

0/5 (Submit Your Rating)

Raleigh, NC

SUMMARY

  • Experienced Software Developer with 6+ years of IT experience in development, system design, enhancement, maintenance, support, re - engineering, debugging and engineering of mission critical complex applications.
  • Experience working with Openstack (icehouse, liberty), Ansible, Kafka, ElasticSearch, Hadoop, StreamSets MySql, Cloudera (CDH5.4), UNIX Shell Scripting, PIG scripting, Hive, FLUME, Zookeeper, Sqoop, Oozie. Python, Spark, git and a variety of RDBMS in the UNIX and Windows environment, agile methodology.
  • 2+ years of experience in Hadoop and Big Data related technologies as a Developer, Administrator.
  • Hands on experience with Cloudera and Apache Hadoop distributions.
  • Followed Test driven development of Agile Methodology to produce high quality software.
  • Worked with Data ingestion, querying, processing and analysis of big data.
  • Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
  • Experienced using Ansible scripts to deploy Cloudera CDH 5.4.1 to setup Hadoop Cluster.
  • Dry Run Ansible Playbook to Provision the OpenStack Cluster and Deploy CDH Parcels.
  • Good Knowledge on Tableau.
  • Experience Using Tools such as Terasort, TestDFSIO and HIBENCH.
  • Worked with setting up benchmark results IO / memory bound tests.
  • Experience in loading data from LINUX file system to HDFS using Sqoop.
  • Experienced in working with OpenStack Platform and with all its components such as Compute, Orchestration and Swift.
  • Worked with different services provided by Openstack such as Neutron, Nova and Cinder.
  • Experience in Web Content Management.
  • Knowledge in Web Technologies such as HTML, CSS, Word Press, Joomla.
  • Ability to solve problems using advanced Excel skills like Pivot tables, Regression & VBA.
  • Ability to work effectively in cross-functional team environments.
  • Ability to learn new technologies and to deliver outputs in short deadlines.
  • Research-oriented, motivated, proactive, self-starter with strong technical, analytical and interpersonal skills.
  • Hands on experience with Spark architecture and its integrations like Spark SQL, Data Frames
  • Replaced existing MR jobs with Spark streaming & Spark data transformations for efficient data processing
  • Hands on experience with Real-time Streaming using Kafka & Spark into HDFS.
  • Ability to spin up different Openstack instances using cloud formation templates.
  • Experience with Cloud Infrastructure like Cisco Cloud Services.
  • Hands on experience with NoSQL Databases like HBase for performing analytical operations.
  • Hands on experience on writing Queries, Stored procedures, Functions and Triggers.

TECHNICAL SKILLS

HADOOP Ecosystem/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Impala, Zookeeper, Kafka, Spark SQL, Spark Streaming and Cloudera Manager

Cloud Infrastructure: Cisco Cloud Services

Programming Languages: SQL, C, C++, Core Java, Scala, Shell, Pig Latin, Hive-QL

Databases: Oracle, Teradata, MySQL, SQL Server, HBase, Cassandra

Web Technologies: HTML, DHTML, CSS, XML, XSLT, Java Script and CSS

Version Control: SVN, CVS and Git

IDES & Command Line Tools: Eclipse, Net Beans, WinSCP

PROFESSIONAL EXPERIENCE

Big Data/Hadoop Engineer

Confidential, Raleigh, NC

Responsibilities:

  • Created Streamsets pipelines for collecting Logs, Alerts and metrics from customer Vpod s.
  • Created automated python scripts to validate the data flow through elastic search. worked with Influxdb to store the metrics data collected from each customer Vpod.
  • Configured Streamsets to attach the VpodID to each data flowing through and create topics in kafka.
  • Setting up the project/tenant with keystone user role.
  • Creating the network, router, Subnet.
  • Creating instances in openstack for setting up the environment.
  • Setting up the ELK( ElatsticSearch, Logstash, Kibana) Cluster.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
  • Trouble shooting any Nova, Glance issue in openstack, Kafka, Rabbitmq bus.
  • Performance testing of the environment - Creating python script to load on IO, CPU.
  • Experience with OpenStack Cloud Platform.
  • Good Understanding of Spark
  • Experienced in Provisioning Hosts with flavors GP(General-purpose),SO(Storage Optimize), MO(Memory Optimize), CO(Compute Optimize).
Environment: Openstack, ElasticSearch, Logstash, Ansible, Rhel7, python, Kafka, streamsets, Influxdb, sensu, rabbitmq, Uchiwa, kibana.

Big Data/Hadoop Engineer

Confidential, Raleigh, NC

Responsibilities:

  • Created Partitioning, Bucketing, and Mapside Join, Parallel execution for optimizing the hive queries decreased the time of execution
  • Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
  • Participated in requirement gathering of the project in documenting the business requirements.
  • Experienced in working with OpenStack Platform and with all its components such as Horizon, Keystone, Heat.
  • Experienced in working with Cisco Cloud Services which is built on top of openstack’s Havana and Icehouse releases.
  • Experienced using Ansible scripts to deploy Cloudera CDH 5.4.1 to setup Hadoop Cluster.
  • Experienced in working with Cloud Computing Services, Networking between different Tenants.
  • Responsible for Performance BenchMarking Tests for Hadoop and Analyze the results with Bare metal servers
  • Installed and Worked with Hive, Pig, Sqoop on the Hadoop cluster.
  • Developed HIVE queries to analyze the data imported to hdfs.
  • Worked with Sqoop commands to import the data from different databases.
  • Experience with OpenStack Cloud Platform
  • Dry Run Ansible Playbook to Provision the OpenStack Cluster and Deploy CDH Parcels.
  • Experience Using Tools such as Terasort, TestDFSIO and HIBENCH.
  • Worked with setting up benchmark results IO / memory bound tests.

Environment: Openstack, Hadoop, Hive, pig, Map Reduce, Python, Ansible, Rhel7, Cloudera, Sqoop, Oozie, Impala.

Software Engineer

Confidential

Responsibilities:

  • Loaded the data using Sqoop from different RDBMS Servers like Teradata and Netezza to Hadoop HDFS Cluster.
  • Performed Sqoop Incremental imports by using Oozie on the basis of every day.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce pattern.
  • Performed Optimizations of Hive Queries using Map side joins, dynamic partitions and Bucketing.
  • Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
  • Created different Hive Generic UDFs and UDAFs analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard and stored them in different summary tables.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Monitored and managed Hadoop cluster using the Cloudera Manager web- interface.
  • Coordinated the Pig and Hive scripts using Oozie workflow.
  • Load and transform large sets of structured, semi structured and unstructured data that includes Avro, sequence files and xml files.

Environment: Hadoop, Big Data, HDFS, MapReduce, Sqoop, Shell Scripting, Oozie, Pig, Hive, Impala, HBase, Spark, Linux, Java, Eclipse.

JAVA Developer/.NET Developer

Confidential

Responsibilities:

  • Developed E-commerce website using ASP.NET
  • Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/ SQL code for procedures and functions.
  • Used CVS, Perforce as configuration management tool for code versioning,
  • Developed applications using Eclipse and used Maven as build and deploy tool.
  • Used Log4J to print the logging, debugging, warning, info on the server console.
  • Involved in writing Spring Configuration XML files that contains declarations and other dependent objects declaration.
  • Used Tomcat web server for development purpose.
  • Involved in creation of Test Cases for Unit Testing.
  • Developed Java Server Pages for the Dynamic front end content that use Servlets and EJBs.
  • Developed HTML pages using CSS for static content generation with JavaScript for validations.

Environment: Visual studio, MySQL, Windows.

JAVA/C++ Developer

Confidential

Responsibilities:

  • Utilized strong C++ and JAVA Programming and communication skills in a team environment
  • Developed Type3 application, disaster recovery module & console commands
  • Debugged and fixed bugs for the MMC project using GDB & stack traces
  • Solved memory allocation problems caused by Big Endian and Little Endian memory conversions
  • Contributed to support and maintenance of software applications with GCC & ICC compilers
  • Worked with CMS & CCMS configuration management tools which are proprietary tools ofALU and used FLexelint for lint checking.
  • Followed agile methodology, interacted directly with the client on the features, implemented optimal solutions, and applications to customer needs.
  • Involved in design and implementation of web tier using Servlets and JSP.

Environment: Java, J2EE, JSP, Spring, Struts, Hibernate, Eclipse, SOA, WebLogic, Oracle, HTML, CSS, Web Services, JUnit, SVN, Windows, UNIX

JAVA Developer

Confidential

Responsibilities:

  • Involved in development of Software Development Life Cycle (SDLC)and UML diagrams like Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phase.
  • Extensively developed web pages using JSP, HTML5, JavaScript and CSS in the front end.
  • Develop business layer components using Spring & Hibernate, and GUI using JSP.
  • Developed web components using JSP, Servlets and JDBC.
  • Worked on authentication modules to provide controlled access to users on various modules
  • Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
  • Developed complex SQL queries to perform efficient data retrieval operations including stored procedures, triggers etc.
  • Designed and Implemented tables and indexes using SQL Server.

Environment: Eclipse, Java/J2EE, JSP, Oracle, HTML, CSS, Servlets, Struts, PL/SQL, Oracle, XML, SQL.

We'd love your feedback!