We provide IT Staff Augmentation Services!

Big Data Engineer Resume

4.00/5 (Submit Your Rating)

DE

SUMMARY:

  • 8 years of Experience in design, development, maintenance and support of Big Data Analytics using Java, Horton work, Hadoop Ecosystem tools like HDFS, Hive, Sqoop, Pig, Spark, Kafka.
  • Experienced in processing Big data on the Horton work, Apache Hadoop framework using MapReduce programs.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate - wide ETL Solutions and Data warehouse tools for reporting and data analysis.
  • Excellent understanding and knowledge of NOSQL databases like HBase and Mongo DB.
  • Good knowledge of Hadoop ecosystem, HDFS, Big Data, RDBMS, SPARK.
  • Having experience on RDD architecture and implementing spark operations on RDD also optimizing transformations and actions in Spark.
  • Experienced in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions, Horton Works and AWS.
  • Good knowledge on Spark, Hadoop, HBase, Hive, Pig Latin Scripts, MR, Sqoop, Flume, Hive QL.
  • Experience in analyzing data using Pig Latin, HiveQL and HBase.
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different databases (i.e. Teradata, Oracle, MYSQL) to Hadoop.
  • Worked on NoSQL databases including HBase, Cassandra and MongoDB
  • Successfully loaded files to Hive and HDFS from MongoDB, HBase
  • Experience in configuring Hadoop Clusters and HDFS.
  • Good Understanding in Apache Hue, GITHUB and SVN.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka
  • Worked extensively in Java, J2EE, XML, XSL, EJB, JSP, JSF, JDBC, MVC, Jakarta struts, JSTL, Spring2.0, Design Patterns and UML.
  • Extensive experience in Object Oriented Programming, using Java & J2EE (Servlets, JSP, Java Beans, EJB, JDBC, RMI, XML, JMS, Web Services, AJAX).
  • Excellent analytical and problem-solving skills and ability to quickly learn new technologies. Worked with Agile, Scrum and Confidential software development framework for managing product development.
  • Deploy data warehouse and BI solution getting information to the users as quickly as possible, so they can see data and request relevant changes while development is under way. Developing and testing reports and dashboards in close collaboration with users
  • Good communication and interpersonal skills. A very good team player with the ability to work independently.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Spark, Scala, Kafka.

No SQL Databases: HBase, Cassandra, MongoDB

Languages: C, Python, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R Programming

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, jQuery

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Methodologies: Agile (Scrum), Theta’s Pragmatic Agile methodology, Waterfall.

PROFESSIONAL EXPERIENCE:

Confidential, DE

Big Data Engineer

Responsibilities:

  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
  • Importing the unstructured data into the HDFS using Flume.
  • Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
  • Written Map Reduce java programs to analyze the log data for large-scale data sets.
  • Involved in using HBase Java API on Java application.
  • Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System, Horton work
  • Implemented Map Reduce jobs using Java API and PIG Latin as well HIVEQL
  • Participated in the setup and deployment of Hadoop cluster, Horton work.
  • Hands on design and development of an application using Hive (UDF).
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Provide support data analysts in running Pig and Hive queries.
  • Involved in HiveQL and Involved in Pig Latin.
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
  • Configured HA cluster for both Manual failover and Automatic failover.
  • Excellent working Knowledge in Spark Core, Spark SQL, Spark Streaming.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters
  • Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
  • Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Experience in writing SOLR queries for various search documents
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
  • Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.

Environment: Big Data Horton Work, Apache Hadoop, Hive, Hue Tool, Zookeeper, Map Reduce, Sqoop, crunch API, Pig 0.10 and 0.11, HCatalog, Unix, Java, JSP, Eclipse, Maven, Oracle, SQL Server, MYSQL.

Confidential, VA

Hadoop Developer

Responsibilities:

  • Processed Big Data using a Hadoop cluster consisting of 40 nodes.
  • Designed and configured Flume servers to collect data from the network proxy servers and store to HDFS.
  • Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
  • Built data pipeline using Pig and Java Map Reduce to store onto HDFS.
  • Applied transformations and filtered both traffic using Pig.
  • Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
  • Performed unit testing using MR Unit.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Experience in design and develop the POC in Spark using Scala t o compare the performance of Spark with Hive and SQL/Oracle .
  • Consumed the data from Kafka using Apache spark.
  • Performed various benchmarking steps to optimize the performance of spark jobs and thus improve the overall processing.
  • Used Spark API over Horton work Hadoop YARN to perform analytics on data in Hive and involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Responsible for building scalable distributed data solutions using Hadoop
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
  • Setup and benchmarked Hadoop/HBase clusters for internal use
  • Developed Simple to complex Map/reduce Jobs using Hive and Pig
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Analyzed the data by performing Hive queries and running Pig scripts to study employee behavior
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs

Environment: Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6, HDFS, Flume, Oozie, DB2, HBase, Mahout

Confidential

Hadoop Admin and Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop .
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Developed Simple to complex Map reduce Jobs using Hive and Pig.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team. Extensively used Pig for data cleansing.
  • Created partitioned tables in Hive. Managed and reviewed Hadoop log files.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting. Installed and configured Pig and written Pig Latin scripts.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data Responsible to manage data coming from different sources
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Sqoop, Java (jdk 1.6), Eclipse, Git, Subversion.

Confidential

Java Developer

Responsibilities:

  • Involved in defining the business rule according to client specific and convert them into High Level Technical Design.
  • Designed entire system according to OOPS & UML by using Rational Tools.
  • Elaborated use cases, interface definition specifications in collaboration with Business.
  • Used Backend as the Oracle database & used JDBC technologies for integration.
  • Extensively used TOAD for all DB related activities & integration testing.
  • Used build and deploy scripts in ANT and UNIX shell scripting.
  • Developed User interface screens using Servlets, JSP, JavaScript, CSS, AJAX, HTML.
  • Involved in unit testing of developed business units & used the JUnit for specifics.
  • Worked along with the Development team & QA team to resolve the issues in SIT/UAT/Production environments.
  • Closely Co-ordinated with Architect, Business Analyst, business team for requirement analysis and doing development and implementation.
  • Spring Framework caching mechanism which was used to pre-load some of the Master Information.
  • Implementation of this project included scalable coding using JAVA, JDBC, JMS with spring.
  • Developed Controller Classes, Command Objects, Action Classes, Form beans, Transfer Objects Singleton at server side for handling requests and responses from presentation Layer.

Environment: Core Java, J2EE1.5/1.6, Struts, Ajax, Rational Rose, Rational Requisite Pro, Hibernate3.0, CVS, RAD7.0 IDE, Oracle10g, JDBC, log4j, WebSphere6.0, Servlets, JSP, Junit.

We'd love your feedback!