We provide IT Staff Augmentation Services!

Bigdata/hadoop Engineer Resume

5.00/5 (Submit Your Rating)

Bentonville, AR

SUMMARY

  • Having 6 years of professional IT industry experience encompassing wide range of skill set in Big Data technologies and Java/J2EE technologies .
  • Around 4 years of experience in working with Big Data Technologies on systems which comprises of massive amount of data running in highly distributive mode in Cloudera, Hortonworks Hadoop distributions.
  • Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, ZooKeeper, Kafka, Flume, MapReduce and Yarn.
  • Strong Knowledge on architecture and components of Spark, and efficient in working with Spark Core, SparkSQL, Spark streaming.
  • Implemented Map reduce programs using JAVA.
  • Implemented Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used pyspark and spark - shell accordingly.
  • Experience in configuringSpark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Experience in importing and exporting data using stream processing platforms like Flume and Kafka.
  • Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive UDF's as required.
  • Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
  • Used Spark Data Frame Operations to perform required Validations in the data.
  • Experience in integrating Hive queries into Spark environment using Spark SQL .
  • Good understanding and knowledge of NoSQL databases like MongoDB, Hbase and Cassandra.
  • Worked on HBase to load and retrieve data for real time processing using REST API.
  • Excellent understanding and knowledge of job workflow scheduling and locking tools/services like Oozie and Zookeeper.
  • Experienced in designing different time driven and data driven automated workflows using Oozie.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
  • Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on AWS.
  • Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
  • Experienced in using Pig scripts to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Good Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
  • Experience in relational databases like Oracle, MySQL and SQL Server.
  • Experienced in using Integrated Development environments like Eclipse, NetBeans, IntelliJ, Spring Tool Suite.

TECHNICAL SKILLS

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.

Languages: Java, Python, Scala

Java Technologies: Servlets, JavaBeans, JSP, JDBC, and Spring MVC

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJS, ExtJS and JSON

No SQL Databases: Cassandra, MongoDB and HBase

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J

ETL Tools: Talend, Informatica

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, MacOS and Windows Variants

ETL Tools: Talend, Informatica

PROFESSIONAL EXPERIENCE

Confidential, Bentonville, AR

Bigdata/Hadoop Engineer

Responsibilities:

  • Experienced to implement Hortonworks distribution system.
  • Creating Hive tables and working on them for data analysis to cope up with the requirements.
  • Developed a framework to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
  • Developed Map Reduce programs for some refined queries on big data.
  • In-depth understanding of classic MapReduce and YARN architecture.
  • Worked with business team in creating Hive queried for ad hoc access.
  • Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Implemented Hive Generic UDF’s to implement business logic.
  • Analyzed the data by performing Hive queries, Spark SQL and Spark Streaming.
  • Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
  • Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
  • Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
  • Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD’s, Spark YARN.
  • Performed Spark join optimizations, troubleshooted, monitored and wrote efficient codes using Scala.
  • Experienced in working with Elastic MapReduce (EMR).
  • Experienced with batch processing of data sources using Apache Spark and Elastic search.
  • Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
  • Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and Pig to pre-process the data.
  • Created detailed AWS Security groups which behaved as virtual firewalls that controlled the traffic allowed reaching one or more AWS EC2 instances.
  • Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
  • Performed data integration with a goal of moving more data effectively, efficiently and with high performance to assist in business-critical projects using Talend Data Integration.
  • Design, developed, unit test, and support ETL mapping and scripts for data marts using Talend.
  • Used HUE for running Hive queries. Created partitions according to day using Hive to improve performance.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
  • Experience in using version control tools like GITHUB to share the code snippet among the team members.

Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, AWS, GitHub, Talend Big Data Integration

Confidential, Dallas, TX

Spark Developer

Responsibilities:

  • Good in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in pyspark .
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Worked on data privacy and data compliance (CCPA)to maintain the confidentiality of the Patient.
  • Used Spark RDD for faster Data sharing.
  • Experienced in querying data using SparkSQL on top of Spark engine for faster data sets processing.
  • Worked on Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
  • Extracted and restructured the data into MongoDB using import and export command line utility tool.
  • Worked on the large-scale Hadoop YARN cluster for distributed data processing and analysis using Hive and MongoDB.
  • Wrote XML scripts to build Oozie functionality.
  • Experience in workflow Scheduler Oozie to manage and schedule job on Hadoop cluster for generating reports on Day and weekly basis.
  • Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented custom serializer, interceptor, source and sink in Flume to ingest data from multiple sources.
  • Involved in writing query using Impala for better and faster processing of data.
  • Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
  • Involved in moving log files generated from various sources to HDFS for further processing through Flume.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Programmed pig scripts with complex joins like replicated and skewed to achieve better performance.
  • Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis.
  • Designing & creating ETL jobs through Talend to load huge volumes of data into MongoDB, Hadoop Ecosystem and relational databases.
  • Created Talend jobs to connect to Quality Stage using FTP connection and process data received from Quality Stage.
  • Migrated data from MySQL server to Hadoop using Sqoop for processing data.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Experienced in developing Shell scripts and Python scripts for system management.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: CDH 3.x and 4.x, Java, Hadoop, Python, Map Reduce, Hive, Pig, Impala, Flume, MongoDB, Sqoop, Talend, Spark, MySQL,AWS.

Confidential

Java/Hadoop Developer

Responsibilities:

  • Involved in Installation and configuration of JDK, Hadoop, Pig, Sqoop, Hive, HBase on Linux environment. Assisted with performance tuning and monitoring.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Worked on creating MapReduce programs to parse the data for claim report generation and running the Jars in Hadoop. Co-ordinated with Java team in creating MapReduce programs.
  • Worked on creating Pig scripts for most modules to give a comparison effort estimation on code development.
  • Writing Map reduce programs using JAVA
  • Collaborated with BI teams to ensure data quality and availability with live visualization.
  • Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables.
  • Created HBase tables to store variable data formats coming from different portfolios Performed real-time analytics on HBase using Java API and Rest API.
  • Performed test run of the module components to understand the productivity.
  • Written Java program to retrieve data from HDFS and providing REST services.
  • Shared responsibility and assistance for administration of Hadoop, Hive, Sqoop, HBase and Pig in team.
  • Shared the knowledge of Hadoop concepts with team members.
  • Used JUnit for unit testing and Continuum for integration testing.

Environment: Cloudera, Hadoop, Pig, Sqoop, Hive, HBase, Java, Eclipse, MySQL, MapReduce.

Confidential

Java Developer

Responsibilities:

  • Responsible for the analyzing, documenting the requirements, designing and developing the application based on J2EE standards. Strictly Followed Test Driven Development.
  • Used Microsoft Visio for designing use cases like Class Diagrams, Sequence Diagrams, and Data Models.
  • Extensively developed user interface using HTML, JavaScript, jQuery, AJAX and CSSon the front end.
  • Designed Rich Internet Application by implementing jQuery based accordion styles.
  • Used JavaScript for the client-side web page validation.
  • Used Spring MVC and Dependency Injection for handling presentation and business logic. Integrated Spring DAO for data access using Hibernate.
  • Developed Struts web forms and actions for validation of user request data and application functionality.
  • Developed programs for accessing the database using JDBC thin driver to execute queries, prepared statements, Stored Procedures and to manipulate the data in the database.
  • Created tile definitions, Struts configuration files, validation files and resource bundles for all modules using Struts framework.
  • Involved in the coding and integration of several business-critical modules using Java, JSF,and Hibernate.
  • Developed SOAP-based web services for communication between its upstream applications.
  • Implemented different Design patterns like DAO, Singleton Pattern and MVC architectural design pattern of spring.
  • Implemented Service Oriented Architecture (SOA) on Enterprise Service Bus (ESB).
  • Developed Message-Driven Beans for asynchronous processing of alerts using JMS.
  • Implemented Rational Rose tool for application development.
  • Used Clear case for source code control and JUnit for unit testing.
  • Performed integration testing of the modules.
  • Used putty for UNIX login to run the batch jobs and check server logs.
  • Deployed application on to Glassfish Server.
  • Involved in peer code reviews.

Environment: Java 6,7, J2EE, Struts 2, Glassfish, JSP, JDBC, EJB, ANT, XML, IBM Web Sphere, JUnit, IBM DB2, Rational Rose 7, CVS, UNIX, SOAP, SQL, PL/SQL.

We'd love your feedback!