We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

3.00/5 (Submit Your Rating)

Merrimack, NH

SUMMARY

  • Over 10 years of experience as a Sr. Big Data Developer with skills in analysis, design, development, testing and deploying various software applications.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 Web Services which provides fast and efficient processing of Teradata Big Data Analytics.
  • Experience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
  • Extensive experience on developing Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used Spark SQL as required.
  • Experience on developing Java MapReduce jobs for data cleaning and data manipulation as required for the business.
  • Strong knowledge on Hadoop eco - systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Proficient knowledge in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
  • Extensive experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Excellent working with data modeling tools like Erwin, Power Designer and ER Studio.
  • Proficient working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
  • Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
  • Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.
  • Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Good experience in developing MapReduce jobs in J2EE /Java for data cleansing, transformations, pre-processing and analysis.
  • Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
  • Experience on Spark and SparkSQL, Spark Streaming, Spark GraphX, Spark MLlib.
  • Extensively development experience in different IDE like Eclipse, NetBeans, and IntelliJ.
  • Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
  • Expertise in JavaScript, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls.
  • Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
  • Strong knowledge in NOSQL column oriented databases like HBase and its integration with Hadoop Cluster.
  • Good knowledge of coding using SQL, SQL Plus, T-SQL, PL/SQL, Stored Procedures/Functions.
  • Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
  • Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).
  • Experience working with Hortonworks and Cloudera environments.
  • Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
  • Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL).
  • Experience with Oozie Scheduler in setting up workflow jobs with MapReduce and Pig jobs.
  • Knowledge of architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
  • Good Understanding on querying datasets, Filtering the data, Aggregations, Joining the disparate datasets and produce ranked or sorted data using Spark RDD, Spark DF, Spark SQL, Hive, Impala.
  • Good at writing custom RDD's in Scala and also implemented design patterns to improve the performance.
  • Experience in analyzing large volume of data using Hive Query Language and also assisted with performance tuning.
  • Experience using middleware architecture using Sun Java technologies like J2EE, JSP, and Servlets.

TECHNICAL SKILLS

Big Data Ecosystem: MapReduce, HDFS, HIVE 2.3, HBase 1.2 Pig, Sqoop, Flume 1.8, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks

Cloud Services: Amazon AWS, EC2, Redshift, MS Azure

Relational Databases: Oracle 12c, MySQL, MS-SQL Server2016

NoSQL Databases: HBase, Hive 2.3, and MongoDB

Version Control: GIT, GitLab, SVN

Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX-WS

Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.

Software Development & Testing Life cycle: UML, Design Patterns (Core Java and J2EE), Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC

Web Technologies: JavaScript, CSS, HTML and JSP.

Operating Systems: Windows, UNIX/Linux and Mac OS.

Build Management Tools: Maven, Ant.

IDE & Command line tools: Eclipse, IntelliJ, Toad and NetBeans.

PROFESSIONAL EXPERIENCE

Confidential, Merrimack, NH

Sr. Big Data Developer

Responsibilities:

  • As a Sr. Big Data Developer worked on Hadoop eco-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
  • Developed Big Data solutions focused on pattern matching and predictive modeling.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
  • Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Load the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
  • Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
  • Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
  • Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
  • Created Hive tables, and loading and analyzing data using hive queries.
  • Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Used Hadoop YARN to perform analytics on data in Hive.
  • Developed and maintained batch data flow using HiveQL and Unix scripting
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed SQL scripts using Spark for handling different data sets and verifying the performance over MapReduce jobs.
  • Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
  • Responsible for fetching real time data using Kafka and processing using Spark and Scala.
  • Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Maintained Hadoop, Hadoop ecosystems, and database with updates/upgrades, performance tuning and monitoring.
  • Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF’s.
  • Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
  • Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
  • Developed and execute data pipeline testing processes and validate business rules and policies
  • Built code for real time data ingestion using Java, MapR-Streams (Kafka) and STORM.
  • Extensively used JQuery to provide dynamic User Interface and for the client side validations.
  • Responsible for defining the data flow within Hadoop eco-system and direct the team in implement them.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.

Environment: Agile, Hadoop 3.0, MS Azure, MapReduce, Java, MongoDB 4.0.2, HBase 1.2, JSON, Scala 2.12, Oozie 4.3, Zookeeper 3.4, J2EE, Python 3.7, JQuery, NoSQL, MVC, Struts 2.5.17, Hive 2.3

Confidential - Mt Laurel, NJ

Sr. Big Data Engineer

Responsibilities:

  • As a Big Data Engineer developed Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
  • Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, Shell Scripting, Hive.
  • Worked with Microsoft Azure Cloud services, Storage Accounts, Azure date storage and Azure Data Factory.
  • Used Agile Methodology of Data Warehouse development using Kanbanize.
  • Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Integrated Oozie with Pig, Hive, and Sqoop and developed Oozie workflow for scheduling and orchestrating the ETL process within the Cloudera Hadoop.
  • Collaborated with other data modeling team members to ensure design consistency and integrity.
  • Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Developed customized classes for serialization and Deserialization in Hadoop.
  • Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
  • Worked closely with business analyst for requirement gathering and translating into technical documentation.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Worked in MongoDB and UNIX environment to non-SQL data clean-up grouping and create the analysis reports.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Created external tables pointing to HBase to access table with huge number of columns.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
  • Extensively used Erwin for developing data model using star schema methodologies.
  • Maintained MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Developed, planed and migrated servers, relational databases (SQL) and websites to Microsoft Azure.
  • Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
  • Generated multiple ad-hoc Python tools and scripts to facilitate map generation and data manipulation.

Environment: Big Data, Hadoop 3.0, Agile, Hive 2.3, HDFS, Oracle 12c, HBase 1.2, Flume 1.8, Pig 0.17, Oozie 4.3, SSIS, SSRS, SQL, PL/SQL, Cassandra 3.11, MongoDB, ETL, Sqoop

Confidential, Hillsboro, OR

Sr. Hadoop Developer

Responsibilities:

  • Extensively worked on Hadoop eco-systems including Hive, Spark Streaming with MapR distribution.
  • Implemented J2EE Design Patterns like DAO, Singleton, and Factory.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
  • Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
  • Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.
  • Developed the Java/J2EE based multi-threaded application, which is built on top of the struts framework.
  • Used Spring/MVC framework to enable the interactions between JSP/View layer and implemented different design patterns with J2EE and XML technology.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Implemented the J2EE design patterns Data Access Object (DAO), Session Façade and Business Delegate.
  • Developed Nifi flows dealing with various kinds of data formats such as XML, JSON and Avro.
  • Implemented MapReduce jobs in HIVE by querying the available data.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Collaborated with business users/product owners/developers to contribute to the analysis of functional requirements.
  • Implemented application using MVC architecture integrating Hibernate and spring frameworks.
  • Utilized various JavaScript and JQuery libraries Bootstrap, Ajax for form validation and other interactive features.
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.

Environment: Hadoop 3.0, Hive 2.1, J2EE, JDBC, Pig 0.16, HBase 1.1, Sqoop, NoSQL, Impala, Java, Spring, MVC, XML, Spark 1.9, PL/SQL, HDFS, JSON, Hibernate, Bootstrap, JQuery, JavaScript, Ajax

Confidential - Hartford, CT

Data Analyst/Data Engineer

Responsibilities:

  • Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
  • Designed the HBase schemes based on the requirements and HBase data migration and validation
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Worked on moving all log files generated from various sources to HDFS for further processing
  • Wrote Hive with Scala scripts to analyze data according to business requirement.
  • Generate metadata, create Talend jobs, mappings to load data warehouse, Data Lake.
  • Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Translated business requirements into working logical and physical data models for OLTP & OLAP systems.
  • Optimized the performance of queries with modification in T-SQL queries, established joins and created clustered indexes.
  • Created HBase tables to store various data formats of data coming from different sources.
  • Created the system for single truth of source on Hadoop file system (HDFS), while enabling transparent data movement and access at various layers.
  • Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Developed SAS macros for data cleaning, reporting and to support routing processing.
  • Embed SQL queries in Excel and used Excel functions to calculate parameters like standard deviation,
  • Performed Data Analysis, Statistical Analysis, Generated Reports and Listing using SAS/SQL, SAS/ACCESS and SAS/EXCEL, Pivot Tables and Graphs

Environment: Erwin 9.5, SAS, SQL, HBase, Scala, T-SQL, AWS, Oozie, Hive 1.9, HDFS, PL/SQL, Excel.

We'd love your feedback!