We provide IT Staff Augmentation Services!

Big Data Developer Resume

3.00/5 (Submit Your Rating)

Cincinnati-oH

SUMMARY

  • 8+ years of total Software development experience with Hadoop Ecosystem, Big Data and Data Science Analytical Platforms, Java/J2EE Technologies, Database Management Systems and Enterprise - level Cloud Base Computing and Applications.
  • 4+ years of work experience in ingestion, storage, querying, processing and analysis of Big Data, with hands on experience in Hadoop Ecosystem development including MapReduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, ZooKeeper, Kafka, Strom, Sqoop, Flume, Oozie and AWS.
  • Executed all phases of a Big Data project life cycle during the tenure that includes Scoping Study, Requirements Gathering, Design, Development, Implementation, Quality Assurance, Application Support for end-to-end IT solution offerings.
  • Exceptional Analytical, Data Interpretation & Problem-solving skills.
  • Good knowledge on Hadoop, Hbase, Hive, Pig Latin Scripts, MR, Sqoop, Flume, Hive QL.
  • Expertise in developing solutions around NOSQL databases like HBase, Neo4J, MongoDB and Cassandra.
  • In-depth understanding of Data Structure and Algorithms and hands-on experience handing multi terabytes of datasets.
  • Experience with all flavor of Hadoop distributions, including Cloudera, Hortonworks, MapR and Apache.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (5.X) distributions and on Amazon web services (AWS).
  • Expertise in job scheduling and monitoring tools like Oozie and ZooKeeper.
  • Experience in implementing Real-Time event processing and analytics using messaging systems like Spark Streaming.
  • Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Experience with various performance optimizations like using distributed cache for small datasets, partitioning, bucketing, query optimization in Hive.
  • Experience in extending HIVE and PIG core functionalities by creating custom UDF’s and UDAF’s by extending required class and implementing methods to evaluate expressions.
  • Proficient in Big data ingestion tools like Flume, Kafka, spark Streaming and Sqoop for streaming and batch data ingestion.
  • Good exposure to ETL processes and tools like Tableau, Informatica, Talend.
  • Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Proven experience in working with software methodologies like Waterfall/Iterative and Agile SCRUM.
  • Expertise in writing the Real-time processing application Using spout and bolt in Storm.
  • Experience in configuring various topologies in storm to ingest and process data on the fly from multiple sources and aggregate into central repository Hadoop.
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Extensive experienced working with Spark tools like RDD transformations, spark MLlib and spark QL.
  • Good knowledge on executing Spark SQL queries against data in Hive by using hive context in spark.
  • Good understanding of R Programming, Data Mining and Machine Learning techniques.
  • Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading the security roles and implementing newer features.
  • Responsible for performing reads and writes in Cassandra from and web application by using java JDBC connectivity.
  • Experienced in moving data from various sources using Kafka producers, consumers and preprocess data using Storm topologies.
  • Experienced in migrating ETL transformations using Pig Latin Scripts, transformations and join operations.
  • Good understanding of MPP databases such as HP Vertica and Impala.
  • Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari and Ganglia.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce, Hadoop GEN2 Federation and YARN architecture and good understanding of workload management, scalable and distributed platform architectures.
  • Developed multiple MapReduce programs to process large volumes of semi/unstructured data files using different MapReduce design patterns
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in testing MapReduce programs using MRUnit and Junit.
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate and EJB.
  • Extensive experience in working with SOA based architectures using Rest based web services using JAX-RS and SOAP based web services using JAX-WS.
  • Highly proficient in SQL, PL/SQL including writing queries, stored procedures, functions and database performance tuning.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Talend, Solr, Storm, Ambari, Mahout, Avro, Parquet and Snappy

Hadoop Distributions: Cloudera, Hortonworks, MapReduce and Apache

Languages: Java, Python, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++

No SQL Databases: Cassandra, MongoDB and HBase

RDBMS: Oracle 9i, 10g, 11i, MS SQL Server, MySQL, DB2 and Teradata

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and Struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM) and JAXB

Web Design Tools: HTML5, DHTML, AJAX, JavaScript, JQuery and CSS5, Angular JS and JSON

Development / Build Tools: Eclipse, Ant, Maven, Ruby Mine, IntelliJ, JUNIT, log4J and ETL

Frameworks: Struts 2.x, spring3.x/4.x and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

Operating systems: UNIX, LINUX, Mac and Windows Variants

Data analytical tools: R and MATLAB

ETL Tools: Tableau, Informatica, Talend and Pentaho

PROFESSIONAL EXPERIENCE

Confidential, Cincinnati-OH

Big Data Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experience in creating Kafka producer and Kafka consumer for Spark streaming which gets the data from different learning systems of the patients.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Evaluated the performance of Apache Spark in analyzing genomic data.
  • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Created POC using Spark SQL and MLlib libraries.
  • Experienced in managing and reviewing Hadoop log files.
  • Worked closely with EC2 infrastructure teams to troubleshoot complex issues.
  • Worked with AWS cloud and created EMR clusters with spark for analyzing raw data processing and access data from S3 buckets.
  • Involved in installing EMR clusters on AWS.
  • Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
  • Apply Transformation rules on the top of Data Frames.
  • Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing
  • Developed Hive UDFs and UDAF’s for rating aggregation.
  • Developed java client API for CRUD and analytical Operations by building a restful server and exposing data from No-SQL databases like Cassandra via rest protocol.
  • Created Hive tables and involved in data loading and writing Hive UDFs.
  • Experience in managing and reviewing Hadoop Log files.
  • Worked extensively with Sqoop to move data from DB2 and Teradata to HDFS.
  • Collected the logs data from web servers and integrated in to HDFS using Kafka.
  • Provided ad-hoc queries and data metrics to the Business Users using Hive, Impala.
  • Worked on various performance optimizations like using distributed cache for small datasets, partition, bucketing in hive, map side joins etc.
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability
  • Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
  • Used Cassandra (CQL) with Java API's to retrieve data from Cassandra tables.
  • Worked on analyzing and examining customer behavioral data using Cassandra.
  • Worked onsolrconfiguration and customizations based on requirements.
  • Indexed documents usingApacheSolr.
  • Extensively use Zookeeper as job scheduler for Spark Jobs.
  • Worked with BI teams in generating the reports onTableau.
  • Used JIRA for bug tracking and CVS for version control.

Environment: Hadoop, MapReduce, HDFS, PIG, Hive, Sqoop, Oozie, Storm, Kafka, Spark, Spark Streaming, Scala, Cassandra, Cloudera, ZooKeeper, AWS, Solr, MySQL, Shell Scripting, Java, Tableau.

Confidential, Emeryville-CA

Big Data Developer

Responsibilities:

  • Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka and JMS.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
  • Involved in loading data from Linux file systems, servers, java web services using Kafka producers and partitions.
  • Applied Kafka custom encoders for custom input format to load data into Kafka Partitions.
  • Implement POC with Hadoop. Extract data with Spark into HDFS.
  • Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
  • Implemented Spark SQL to access hive tables into spark for faster processing of data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed code to read data stream from Kafka and send it to respective bolts through respective stream.
  • Worked on Spark streaming using Apache Kafka for real time data processing.
  • Experience in creating Kafka producer and Kafka consumer for Spark streaming.
  • Developed Map Reduce jobs using Map Reduce Java API and HIVEQL.
  • Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
  • Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using Oozie.
  • Experienced in using Avro data serialization system to handle Avro data files in map reduce programs.
  • Experienced in optimizing Hive queries, joins to handle different data sets.
  • Configured Oozie schedulers to handle different Hadoop actions on timely basis.
  • Involved in ETL, Data Integration and Migrationby writing pig scripts.
  • Used different file formats like Text files, Sequence Files, Avrousing Hive SerDe's.
  • Integrated Hadoop with Solr and implement search algorithms.
  • Experience in Storm for handling real-time processing.
  • Hands on Experience working in Hortonworks distribution.
  • Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
  • Worked hands on No-SQL databases like MongoDB for POC purpose in storing images and URIs.
  • Designed and implemented MongoDB and associated RESTful web service.
  • Worked on analyzing and examining customer behavioral data using MongoDB.
  • Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
  • Involved in writing test cases and implement test classes using MRUnit and mocking frameworks.
  • Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
  • Setup Spark EMR to process huge data which is stored inAmazon S3.
  • Experience in processing large volume of data and skills in parallel execution of process usingTalendfunctionality.
  • UsedTalendtool to create workflows for processing data from multiple source systems.

Environment: MapReduce, HDFS, Sqoop, LINUX, Oozie, Hadoop, Pig, Hive, Solr, Spark Streaming, Kafka, Storm, Spark, Scala, Python, MongoDB, Hadoop Cluster, Amazon Web Services, Talend.

Confidential, Norwalk-CT

Hadoop Developer

Responsibilities:

  • Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
  • Work closely with various levels of individuals to coordinate and prioritize multiple projects. Estimate scope, schedule and track projects throughout SDLC.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Map Reduce, Hive.
  • Written multiple MapReduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Configured Flume source, sink and memory channel to handle streaming data from server logs and JMS sources.
  • Experience in working with Flume to load the log data from multiple sources directly into HDFS.
  • Worked in the BI team in Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
  • Assess existing and available data warehousing technologies and methods to ensure our Data warehouse/BI architecture meets the needs of the business unit and enterprise and allows for business growth.
  • Involved in source system analysis, data analysis, data modeling to ETL (Extract, Transform and Load).
  • Handling structured and unstructured data and applying ETL processes.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
  • Developed Flume Agents for loading and filtering the streaming data into HDFS.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Flume.
  • Extensively used Pig for data cleansing.
  • Implemented logging framework - ELK stack (Elastic Search, LogStash& Kibana) onAWS.
  • Created partitioned tables in Hive.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Developed the Pig UDF’S to pre-process the data for analysis.
  • Coding complex Oracle stored procedures, functions, packages, and cursors for the client specific applications.
  • Experienced in using Java Rest API to perform CURD operations on HBase data.
  • Applied Hive queries to perform data analysis on HBase using Storage Handler to meet the business requirements
  • Writing Hive Queries to Aggregate Data that needs to be pushed to the HBase Tables.
  • Prepare Developer (Unit) Test cases and execute Developer Testing.
  • Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
  • Supports and assist QA Engineers in understanding, testing and troubleshooting.

Environment: Hadoop, Hive, Linux, Map Reduce, Sqoop, Storm, HBase, Flume, Eclipse, Maven, Junit, agile methodologies.

Confidential, Houston-TX

Java/Hadoop Developer

Responsibilities:

  • Developed PIG UDF'S for manipulating the data according to Business Requirements and worked on developing custom PIG Loaders.
  • Developed Java Map Reduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
  • Collected and aggregated enormous amounts of web log data from various sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Developed PIG scripts for the analysis of semi structured data.
  • Worked on the Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer).
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts). Designed and implemented MapReduce based large-scale parallel processing.
  • Processed data into HDFS by developing solutions, analyzed the data using MapReduce programs produce summary results from Hadoop to downstream systems.
  • Developed and updated the web tier modules using Struts 2.1 Framework.
  • Modified the existing JSP pages using JSTL.
  • Implemented Struts Validator for automated validation.
  • Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL server.
  • Developed Reference Architecture for eComm SOA Environment.
  • Used CVS for version controlling and JUnit for unit testing.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, MySQL, Cassandra, Java, Shell Scripting, MySQL, SQL.

Confidential

Java Developer

Responsibilities:

  • Complete involvement in Requirement Analysis and documentation on Requirements Specification.
  • Developed prototype based on the requirements using Struts2 framework as part of POC (Proof of Concept)
  • Prepared use-case diagrams, class diagrams and sequence diagrams as part of requirement specification documentation.
  • Involved in design of the core implementation logic using MVC architecture.
  • Used Apache Maven to build and configure the application.
  • Configured struts.xml file with required action-mappings for all the required services.
  • Developed JAX-WS web services to provide services to the other systems.
  • Developed JAX-WS client to utilize few of the services provided by the other systems.
  • Involved in developing EJB 3.0 Stateless Session beans for business tier to expose business to services component as well as web tier.
  • Implemented Hibernate at DAO layer by configuring hibernate configuration file for different databases.
  • Developed business services to utilize Hibernate service classes that connect to the database and perform the required action.
  • Developed JSP pages using struts JSP-tags and in-house tags to meet business requirements.
  • Developed JavaScript validations to validate form fields.
  • Performed unit testing for the developed code using JUnit.

Environment: Core java, JavaBeans, HTML, CSS 2, PL/SQL, MySQL, JavaScript, Flex, AJAX and Windows.

Confidential

Java Developer

Responsibilities:

  • Developed analysis level documentation such as Use Case, Business Domain Model, Activity & Sequence and Class Diagrams.
  • Developed and deployed UI layer logics of sites using JSP.
  • Struts (MVC) is used for implementation of business model logic.
  • Worked with Struts MVC objects like Action Servlet, Controllers, and validators, Web Application Context, Handler Mapping, Message Resource Bundles and JNDI for look-up for J2EE components.
  • Involved in writing CSS styles to give more look and feel to UI.
  • Provide technical and functional support to testing teams.
  • Interacting with the client to understand the project and finalize its scope.
  • Estimation, design and development of various modules
  • Developed dynamic JSP pages with Struts.
  • Used built-in/custom Interceptors and Validators of Struts.
  • Developed the XML data object to generate the PDF documents and other reports.
  • Used Hibernate, DAO, and JDBC for data retrieval and medications from database.
  • Messaging and interaction of Web Services is done using SOAP.
  • Involved in Unit Testing, User Acceptance Testing and Bug Fixing.
  • Implemented mid-tier business services to integrate UI requests to DAO layer commands.

Environment: J2EE, JDBC, Java, Servlets, JSP, Struts, Hibernate, Web services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript, WebLogic, XML, Junit, Oracle 10g, My Eclipse

We'd love your feedback!