Hadoop/spark Consultant Resume
Dallas, TX
PROFESSIONAL SUMMARY:
- Around 6 years of IT experience in software development and support with experience in developing strategic methods for deploying Big Data technologies to efficiently solve Big Data processing requirement.
- Expertise in Hadoop eco system components HDFS, Map Reduce, Yarn, HBase, Pig, SQOOP, Spark, Spark SQL, Spark Streaming and Hive for scalability, distributed computing, and high - performance computing.
- Experience in using Hive Query Language for data Analytics.
- Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
- Strong knowledge in NOSQL column oriented databases like HBase, Cassandra and its integration with Hadoop cluster.
- Flexible with Unix/Linux and Windows Environments working with Operating Systems like Centos (RHEL), Ubuntu 13/14, Cosmos .
- Good experience on Kafka and Storm.
- Java Developer with extensive experience on various Java Libraries, API's and frameworks .
- Hands on development experience with RDBMS, including writing complex, Stored procedure and triggers .
- Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
- Strong communication, collaboration & team building skills with proficiency at grasping new Technical concepts quickly and utilizing them in a productive manner.
- Strong analytical and Problem-solving skills.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: HDFS, Map Reduce, SQOOP, Flume, Pig, Hive, impala, Spark, Kafka, Storm
Databases: Oracle, My SQL, No SQL HBase, Cassandra
Monitoring & Reporting: Tableau, Custom Shell Script
Build Tools: Maven, SQL Developer
Programing & Scripting: Java, SQL, Scala, Shell Scripting
Java Technologies: Servlets, Hibernate, Spring, JDBC
Web Dev. Technologies: HTML, XML, JSON, CSS, JavaScript
Operating Systems: Linux, Unix, Cent OS, Windows
PROFESSIONAL EXPERIENCE:
Confidential, Dallas, TX
Hadoop/Spark Consultant
Responsibilities:
- Involved in queries and writing data from OLTP server to Hadoop file system using SQOOP.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through SQOOP
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
- Actively participated to writing Hive and Impala queries to load and Processing data in Hadoop File system.
- Developed Spark scripts by using Scala shell commands as per the requirement.
Environment: Cloudera (CDH5.4), Hadoop, HBASE, HDFS, Spark Core, Spark SQL, Java (jdk1.7), ELK stack.
Confidential, Falls Church, VA
Teaching Associate (Hadoop and Java)
Responsibilities:
- Responsible for teaching the OOAD and Tools and Technology subjects and the projects related to the subjects.
- Helping and guiding the students for any problems that they come across academically.
- Become familiar with the Hadoop Distributed File System (HDFS), a distributed system architecture that supports data locality for data-intensive computing, and understand the trade-offs of this architecture compared to the computation/storage architecture.
- Understand the underlying concepts of the MapReduce programming model and can design and implement MapReduce programs to analyze a large data set.
- Understand the scalability and performance of MapReduce programs running on HDFS.
Environment: HDFS, Map Reduce, Flume, Pig, SQOOP, Hive, HBase.
Confidential
Hadoop Developer
Responsibilities:
- Worked on SQOOP to import data from various relational data sources.
- Worked with Flume in bringing click stream data from front facing application logs.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Exported the result set from HIVE to MySQL using Shell scripts.
- Used Git for version control.
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Worked on performing data standardization using PIG scripts.
Environment: HDFS, Map Reduce, Flume, Pig, SQOOP, Hive, HBase, Shell Scripting.
Confidential
Hadoop Developer
Responsibilities:
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing Involved in Importing and exporting data into HDFS and Hive.
- Involved in defining job flows.
- Involved in managing and reviewing Hadoop log files.
- Involved in Loading and transforming large sets of structured, semi structured and unstructured data.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in Map reduce way.
Environment: Java 6 JDK 1.6, Eclipse, Oracle 10g, Sub Version, Hadoop, Hive, Cassandra, Linux, HDFS, Hive.
Confidential
JAVA/ J2EE Developer
Responsibilities:
- Involved in Java, J2EE, struts, web services and Hibernate in a fast-paced development environment.
- Followed agile methodology, interacted directly with the client on the features, implemented optimal solutions, and tailor application to customer needs.
- Involved in design and implementation of web tier using Servlets and JSP.
- Used Apache POI for Excel files reading.
- Developed the user interface using JSP and Java Script to view all online trading transactions.
- Designed and developed Data Access Objects (DAO) to access the database.
- Used DAO Factory and value object design patterns to organize and integrate the JAVA Objects
- Coded Java Server Pages for the Dynamic front end content that use Servlets and EJBs.
- Coded HTML pages using CSS for static content generation with JavaScript for validations.
- Used JDBC API to connect to the database and carry out database operations.
- Used JSP and JSTL Tag Libraries for developing User Interface components.
- Performing Code Reviews.
- Performed unit testing, system testing and integration testing.
- Involved in building and deployment of application in Linux environment.
Environnent: Java, J2EE, JDBC, Struts, SQL. Hibernate, Eclipse, Apache POI, CSS.