Hadoop And Big Data Developer Resume
Mooresville North, CarolinA
SUMMARY
- Big Data Engineer with 5+ years of IT experience with Big Data domain tools like Hadoop, Spark, MapReduce, Hive and other open source tools/ technologies
- Experience working with Cloudera & Hortonworks Distribution of Hadoop
- Experience in development of Big Data projects using Spark, Hadoop, Hive, PIG, Flume and MapReduce open source tools/technologies
- Experience in Big Data Analytics with hands on experience in Data Extraction, Transformation, Loading and Data Analysis, Data Visualization using Cloudera Platform (Map Reduce, HDFS, Hive, Pig, Sqoop, Flume, Hbase, Oozie)
- Expert in ApacheSparkdata processing project to handle data from RDBMS and streaming sources
- Expert in Data Extraction, Transformation and Loading (ETL process) from Source to target systems.
- Experience in analyzing the different types of data that flow from data lakes to Hadoop Clusters
- Experience in manipulating the streaming data to clusters through Flume
- Proficient in working with SQL(DBMS) and NoSQL databases like HBase
- Batch processing jobs using ApacheSparkto increase speeds by ten - fold compared to that of MR jobs
- Experience in partitioning the Big Data according the business requirements using Hive Indexing, partitioning and Bucketing
- Experienced in importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from relational databases like Oracle, MySQL, Teradata into HDFS and Hive using Sqoop
- Expertise in writing HIVE queries, Pig and MapReduce scripts and loading the huge data from local file system and HDFS to Hive
- Good Knowledge in Spark/Scala
- Experience in using Sequence files, AVRO file, RC file formats; Managing and reviewing Hadoop log files
- Expert in Sentiment Analysis using Hadoop ecosystem components for storage and processing data, exported data into TABLEAU using Live connection
- Experience in Software Development Life Cycles(SDLC), SQL, Java, OOP concepts and java methodologies
- Worked with Java components like Exception handling, Multi-threading, Synchronization, Serialization, IO, Java Beans, XML, HTML, collections, CSS & JavaScript
- Strong knowledge in Agile and Waterfall methodologies
- Strong ability to coordinate with both external and internal clients
- Highly motivated and versatile team player with the ability to work independently & adapt quickly to new emerging technologies
TECHNICAL SKILLS
Big Data Technologies: Hadoop MapReduce, HDFS, HUE, HBASE, HIVE, OOZIE, SQOOP, PIG, FLUME/SOLR, IMPALA, SPARK/SCALA
Programming Languages: JAVA/J2EE, C, UNIX Shell commands, Java Beans, JDBC, HTML, Servlets
Scripting Languages: Java Script, Shell Script
Web Development: HTML, JavaScript, CSS
Databases: Oracle, MySQL, Hive, HBase
Technologies/Tools: SQL Development, JDBC
Operating Systems: UNIX, Microsoft Windows XP/07/08/10, Linux
Reporting Tools: Tableau
IDE: Eclipse, NetBeans
Hadoop Distributions: Cloudera, HortonWorks
Methodologies: Waterfall, Agile
Data Importing Tools: Sqoop, Flume
Data Analysis Tools: Pig, Hive
PROFESSIONAL EXPERIENCE
Confidential, Mooresville, North Carolina
Hadoop and Big Data Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs
- Migrated Existing MapReduce programs toSparkModels using Python
- Teamed up with Architects to designSparkmodel for the existing MapReduce mode
- Involved in building theETLarchitecture and Source to Target mapping to load data into clusters
- Involved in Data Extraction, Transformation and Loading (ETL process) from Source to target systems.
- Importing and exporting data into HDFS and Hive using Sqoop and Flume
- Integrate Apache Spark with Hadoop components
- Extensive experience in writing HDFS and Pig Latin commands
- Developed complex queries using HIVE and IMPALA
- Developed Spark SQL to load tables into HDFS to run select queries on top
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
- Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest data into HDFS for analysis
- Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP
- Configured Hive metastore with MySQL, which stores the metadata for Hive tables
- Experience in HBase database manipulation with structured, unstructured and semi-structured types of data
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
- Written Hive and Pig scripts as per requirements
- Implemented using SCALA and SQL for faster testing and processing of data
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically
- Developed Hive Scripts, Pig scripts, Unix Shell scripts, programming for all ETL loading processes and converting the files into parquet in the Hadoop File System
- Experience in writing aggression logics in different combinations to perform complex data analytics to the business needs
Environment: Hadoop, SPARK- SCALA/PYTHON HDFS, Hive, Pig, Sqoop, MapReduce, Cloudera, NoSQL, HBase, Shell Scripting, Linux
Confidential, Woodlands, Texas
Hadoop Developer
Responsibilities:
- Involved in use case analysis, design and development of BigData solutions using Hadoop for Customer spend analysis, product performance monitoring and offer generating engine
- Major contribution in designing and sizing the application and procuring the required hardware storage on MapR M5 and M7 clusters
- Architectural design and recommendation for key components for the applications
- Successfully migrated the 30 use case requirements within very short span of time by adapting to new clusters, domain and available resources
- Played key team lead role in production implementation and implemented without any issues
- Good presentation skills to target variety of teams including technical and non-technical background and make them aware of the platform and its opportunities
- Wonderful data modeling and mining techniques to derive more value from own data created by customer
- Implemented the very first application in production environment using internal new Data Lake model enterprise
- Created Hive and Pig scripts for data manipulation and cleansing needs for customer spend analysis requirements
- Customer quintile calculation with additional requirements has been developed through user defined algorithm with ability to handle dynamic spend category
- Designed and Developed a complete dynamic processing application with expands the processing of Spend Category
- Excellent Onsite/Offshore work management and multi-vendor contributing deliverables
- Created workflow and scheduling for the applications using Oozie coordinator
- Excellent hands-on experience in creating and publishing Tableau reporting for various analytical user requirement
- Played key role in getting the Hadoop data to Tableau for analytical purposes
- Implemented proof of concept in Spark using Python on live chat analysis which evaluates the customer chat experience and various attributes to define the type of customer and their navigation path for any clarification
- Good understanding of any data lake model and contributed ingestion and extracting modules using internal Data Lake solution using Hive Server 2
Environment: MapR distribution M5 and M7 clusters, Apache Hadoop 2.0, Map reduce programming using Java, Pig, Hive, MapR DB, Spark, Python, Oozie, Tableau
Confidential
Hadoop Developer
Responsibilities:
- Involved in Use Case analysis, design and development of BigData Solutions using Hadoop for log processing and offer generating engine
- Installed and configured Hadoop MapReduce, Flume, Avro and HBase, Sqoop
- Involved in data ingestion by installing Flume importing tool to import the data
- Implemented flume topology to get system/business metrics for application monitoring and integrated it with HDFS, Hbase and Hive.
- Involved in pushing data from Solr to HBase
- Created NoSQL tables in HBase for different metrics
- Used PIG to pre-structure the data and perform analytics
- Created hive tables, loading tables with data and performing analytics by writing different hive queries
- Involved in writing Impala queries for faster processing of data
- Involved in performing sentiment analysis
Environment: Cloudera Hadoop, MapReduce, HDFS, Java (jdk1.6), Flume, Avro and HBase, Sqoop, Solr, NoSQL, Pig, Impala.
Confidential
Java Developer
Responsibilities:
- Responsible and active in the analysis, design, implementation and deployment of full software development life-cycle (SDLC) of the project
- Designed and developed user interface using JSP, HTML and JavaScript
- Developed struts action classes, action forms and performed action mapping using Struts, Framework and performed data validation in form beans and action classes
- Involved in multi-tiered J2EE design utilizing MVC architecture (Struts Framework) and Hibernate
- Extensively used Struts Framework as the controller to handle subsequent client requests and invoke the model based upon user requests
- Involved in system design and development in core java using Collections, multithreading
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes to the record and save the updated information back to the database
- Wrote JavaScript validations to validate the fields of the user registration screen and login screen
- Developed stored procedures and triggers using PL/SQL to calculate and update the tables to implement business logic
- Design and develop XML processing components for dynamic menus on the application
- Involved in postproduction support and maintenance of the application
Environment: Oracle 11g, Java 1.5, Struts 1.2, Servlets, HTML, XML, MS MS SQL Server 2005, J2EE, JUnit, Tomcat 6