We provide IT Staff Augmentation Services!

Big Data Consultant Resume

Dallas, TX

PROFESSIONAL SUMMARY:

  • 9 years of Professional IT Experience in Software Development with 4 years of Programming Experience in design and development of Enterprise and Web Applications in JAVA/J2EE and related technologies, with 4+ years of comprehensive experience as Hadoop Consultant.
  • Highly skilled in Big Data core components and Ecosystems Hadoop, HDFS, MapReduce, YARN, Hive, HBase, Pig, Sqoop, Flume.
  • Expertise in writing Hadoop Jobs for analyzing data using MapReduce, Hive, & Pig.
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Experience in importing and exporting data using Sqoop to HDFS from Relational Database Systems.
  • Experience in creating Sqoop job with incremental load to populate Hive External tables.
  • Experience in writing UDFs in Hive and Pig.
  • Experience in Hive Partitioning and Bucketing.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experience in NoSQL Column - Oriented Databases like HBase and its Integration with Hadoop cluster.
  • Experience in collecting and aggregating Log Data and storing in HDFS for further analysis using Apache Flume .
  • Experience in working on multiple platforms like Linux and Windows.
  • Experience working in an iterative, agile software lifecycle with strong ability to estimate/scope the development of projects.
  • Knowledge of continuous integration software (Jenkins) and version control systems (GIT).
  • Excellent knowledge of bug tracking, project management tools such as Rally, DevTrack and JIRA.
  • Experience in UNIX shell scripting.
  • Acute knowledge in Spark Core , Spark-SQL and Spark streaming designing.
  • Brief experience in using Kafka and Spark for real-time data processing.
  • Experience as a Java Developer in Web/intranet, client/server technologies using J2EE like Servlets, JSP, JDBC and SQL
  • Having experience in development of web-based applications using Struts Framework.
  • Experience in Object Oriented Programming Concepts.
  • Experience in using application server Glassfish and web server Tomcat.
  • Experience in working with MyEclipse, Net Beans IDE's.
  • Experience in Oracle, MySQL Databases.
  • Have strong Database knowledge, good in PL/SQL programming and RDBMS concepts.
  • Ability to quickly ramp up and start producing results on given any tool or technology.

TECHNICAL SKILLS:

Big Data Technologies: Yarn, HDFS, Sqoop, Oozie, Hive, Pig, Flume, HBase, Spark.

Programming: PL/SQL, Shell Script, Java, Scala

ELT & BI / Reporting Tools: MicroStrategy, Tableau

IDE: Eclipse, IntelliJ

Databases: Oracle 9i/10g, MySql 5.0, MS SQL Server

Methodologies: Agile, Waterfall, SDLC

Operating Systems: Windows, Unix/ Linux

Web Technologies: Servlets, JSP, HTML, Java Script

Web Servers: Apache Tomcat

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Big Data Consultant

Responsibilities:

  • Developed scalable, Hadoop-based data processing algorithms using Hive and the Hadoop ecosystem.
  • Worked on importing data from MySQL into HDFS and HIVE using Sqoop.
  • Built a data pipeline from Application (Source DB) to reporting layer using Hadoop Framework.
  • Developed Hive-UDF for parsing data from specific columns.
  • Created Hive External tables on the existing HDFS file systems.
  • Written shell scripts to pull data from various sources into Hadoop.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Sqoop scripts to import data from relational sources and handled incremental loading.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Used Apache Flume to collect and aggregate large amounts of Log Data and storing the data in HDFS for further analysis.
  • Used Hive and Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Analyzed data using Hive, Pig and custom Map Reduce programs.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Written shell scripts to pull data from various sources (file share servers) into Hadoop.
  • Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables.
  • Used Tableau as a business intelligence tool to visualize the customer information as per the generated records.
  • Transform massive amounts of raw data into actionable analytics.
  • Involved in POC for near real time analytics using Kafka and spark.

Environment: Hadoop Framework, Cloudera (CDH 5), Hive, Shell-Scripting, Sqoop, Oozie, Unix, GIT, MySQL, Tableau, Spark, Kafka.

Confidential, Philadelphia, PA

Sr. Hadoop Consultant

Responsibilities:

  • Involved in the configuration of Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Involved in shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HBASE/HDFS for further analysis.
  • Real-time streaming the data using Spark with Kafka.
  • Collected the logs data from web servers and integrated in to HBASE using Flume.
  • Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Created Hive tables and involved in data loading and writing Hive UDFs.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Used Hive in partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive serdes like REGEX, JSON and Avro.
  • Automated workflows using shell scripts pull data from various databases into Hadoop.
  • Developed scalable, Hadoop-based data processing algorithms using MapReduce, Pig, Hive, HBase and the Hadoop ecosystem.
  • Developed Sqoop scripts to import data from relational sources and handled incremental loading.
  • Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
  • Involved in writing scripts to automate the process and generate reports.

Environment: Hadoop, MapReduce, Spark, Java, Hive, HDFS, PIG, Sqoop, Kafka, Oozie, Cloudera, Flume, HBase, Zookeeper, CDH4&CDH5, Oracle, PL/SQL, Linux

Confidential, St. Louis, MO

Hadoop Developer

Responsibilities:

  • Involved in POC’s for Hadoop implementation.
  • Involved in the configuration of Hadoop clusters on AWS.
  • Involved in shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Used Sqoop to import data from Teradata to HDFS.
  • Used HDFS commands to move data from local system to HDFS.
  • Developed MapReduce Programs for parsing the raw data and populating staging tables using Java.
  • Used Pig & Python scripting for pre-processing the data.
  • Created staging tables for data transfer from Hive.
  • Developed and executed Hive Queries for deformalizing the data.
  • Created Hive External tables on the existing HDFS file systems.
  • Installed and configured Hive and also written Hive UDFs & Queries.
  • Created Hive queries to compare the raw data with EDW tables and performing aggregates.
  • Created Partitions and Buckets on Hive tables.
  • Performed ad-hoc queries on structured data using Hive QL and used Partition, bucketing techniques and joins with Hive for faster data access.
  • Experienced in managing and reviewing Hadoop log files.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig Latin Scripts.
  • Performed joins, group by and other operations in MapReduce.
  • Automated workflows using shell scripts pull data from various databases into Hadoop.

Environment: AWS - EC2&S3, HDFS, Hive, PIG, Sqoop, Oozie, ZooKeeper, XML, Linux.

Confidential, Plano, TX

Hadoop Developer

Responsibilities:

  • Worked on Cloudera CDH distribution.
  • Involved in the configuration of Hadoop clusters on AWS.
  • Involved in writing the script files for processing data and loading to HDFS.
  • Created Hive tables to store the processed results in a tabular format.
  • Involved in writing CLI commands using HDFS.
  • Created External Hive Table on top of parsed data.
  • Worked with different data sources like Avro data files, XML files, JSON files, SQL server and Oracle to load data into Hive tables.
  • Used Hive in partitioning, bucketing concept in hive and designed both Managed and External tables in hive to optimize performance.
  • Developed Hive UDF's libraries for business requirements.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Imported the customers log data into HDFS using Flume.

Environment: HDFS, Hive, Pig, Shell Script, Sqoop, Oozie, Unix, MySQL.

Confidential

Java Developer

Responsibilities:

  • Involved in designing, developing and implementing enterprise applications using Java/J2EE, Core Java, JDBC, Servlets, JSP, Java Beans.
  • Involved in developing various modules using servlets, JSP, JPA.
  • Used JPA for mapping business objects to database
  • Used Eclipse as Java IDE tool for creating JSPs, Servlets, XML.
  • Developed various Java beans for performance of business processes and effectively involved in Impact analysis.
  • Developed user interface using JSP, JSP Tag libraries and Struts Tag Libraries to simplify the complexities of the application.
  • Involved in creating a user-friendly GUI using HTML and JSP.
  • Involved in Documentation and Use case design using UML modelling include development of Class diagrams, Sequence diagrams, and Use case diagrams.

Environment: Java, JSP, Servlets, JDBC, NetBeans, Glassfish Server, MySQL, JavaScript.

Confidential

Software Engineer

Responsibilities:

  • Responsible for providing user-interface to make content types, access remote data, creating tools for Center using HTML and JSP.
  • Developed the application framework following the MVC architecture using Struts.
  • Implemented Struts2 to write Action classes for handling requests and processing form submissions.
  • Used Struts Validator for server side and client-side validations.
  • Implemented the validations using Struts MVC Framework.
  • Developed Struts MVC compliant components for the web tier.
  • Developed user interface using JSP, JSP Tag libraries and Struts Tag Libraries to simplify the complexities of the application.
  • Struts with Tiles is the MVC framework which used for the application.
  • Implemented MVC architecture by separating the business logic from the presentation layer using Struts.
  • Developed application using Struts, Servlets and JSPs.
  • Used XML, XSL and XSLT for developing a dynamic and flexible system for handling data.
  • Used SOAP (Simple Object Access Protocol) for web service by exchanging XML data between the applications.
  • Used Struts Framework for development of Web applications.
  • Used TOAD as database tool for running SQL queries.
  • Designed and developed the business logic layer and data access layer using different kinds of Data Access Objects (DAO's).
  • Involved in creating User Requirement Document (URD) and Software requirement specification (SRS) for the application.
  • Involved in creating, debugging and deploying Building Block extensions in Center.

Environment: Java, Struts, Java Server Pages, Servlets, JavaTaglibs, JDBC, Tomcat, Oracle, Eclipse.

Hire Now