Sr. Big Data Developer Resume Merrimack, NH - Hire IT People

SUMMARY

Over 10 years of experience as a Sr. Big Data Developer with skills in analysis, design, development, testing and deploying various software applications.
Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 Web Services which provides fast and efficient processing of Teradata Big Data Analytics.
Experience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
Extensive experience on developing Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used Spark SQL as required.
Experience on developing Java MapReduce jobs for data cleaning and data manipulation as required for the business.
Strong knowledge on Hadoop eco - systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
Proficient knowledge in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
Extensive experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
Excellent working with data modeling tools like Erwin, Power Designer and ER Studio.
Proficient working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.
Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
Good experience in developing MapReduce jobs in J2EE /Java for data cleansing, transformations, pre-processing and analysis.
Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
Experience on Spark and SparkSQL, Spark Streaming, Spark GraphX, Spark MLlib.
Extensively development experience in different IDE like Eclipse, NetBeans, and IntelliJ.
Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
Expertise in JavaScript, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls.
Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
Strong knowledge in NOSQL column oriented databases like HBase and its integration with Hadoop Cluster.
Good knowledge of coding using SQL, SQL Plus, T-SQL, PL/SQL, Stored Procedures/Functions.
Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).
Experience working with Hortonworks and Cloudera environments.
Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL).
Experience with Oozie Scheduler in setting up workflow jobs with MapReduce and Pig jobs.
Knowledge of architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
Good Understanding on querying datasets, Filtering the data, Aggregations, Joining the disparate datasets and produce ranked or sorted data using Spark RDD, Spark DF, Spark SQL, Hive, Impala.
Good at writing custom RDD's in Scala and also implemented design patterns to improve the performance.
Experience in analyzing large volume of data using Hive Query Language and also assisted with performance tuning.
Experience using middleware architecture using Sun Java technologies like J2EE, JSP, and Servlets.

TECHNICAL SKILLS

Big Data Ecosystem: MapReduce, HDFS, HIVE 2.3, HBase 1.2 Pig, Sqoop, Flume 1.8, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks

Cloud Services: Amazon AWS, EC2, Redshift, MS Azure

Relational Databases: Oracle 12c, MySQL, MS-SQL Server2016

NoSQL Databases: HBase, Hive 2.3, and MongoDB

Version Control: GIT, GitLab, SVN

Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX-WS

Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.

Software Development & Testing Life cycle: UML, Design Patterns (Core Java and J2EE), Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC

Web Technologies: JavaScript, CSS, HTML and JSP.

Operating Systems: Windows, UNIX/Linux and Mac OS.

Build Management Tools: Maven, Ant.

IDE & Command line tools: Eclipse, IntelliJ, Toad and NetBeans.

PROFESSIONAL EXPERIENCE

Confidential, Merrimack, NH

Sr. Big Data Developer

Responsibilities:

As a Sr. Big Data Developer worked on Hadoop eco-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
Developed Big Data solutions focused on pattern matching and predictive modeling.
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
Involved in Agile methodologies, daily scrum meetings, spring planning.
Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
Load the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
Created Hive tables, and loading and analyzing data using hive queries.
Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
Developed Hive queries to process the data and generate the data cubes for visualizing.
Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
Used Hadoop YARN to perform analytics on data in Hive.
Developed and maintained batch data flow using HiveQL and Unix scripting
Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
Developed SQL scripts using Spark for handling different data sets and verifying the performance over MapReduce jobs.
Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
Responsible for fetching real time data using Kafka and processing using Spark and Scala.
Worked on Kafka to import real time weblogs and ingested the data to Spark Streaming.
Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
Maintained Hadoop, Hadoop ecosystems, and database with updates/upgrades, performance tuning and monitoring.
Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF’s.
Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
Developed and execute data pipeline testing processes and validate business rules and policies
Built code for real time data ingestion using Java, MapR-Streams (Kafka) and STORM.
Extensively used JQuery to provide dynamic User Interface and for the client side validations.
Responsible for defining the data flow within Hadoop eco-system and direct the team in implement them.
Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
Involved in running Hadoop jobs for processing millions of records of text data.
Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.

Environment: Agile, Hadoop 3.0, MS Azure, MapReduce, Java, MongoDB 4.0.2, HBase 1.2, JSON, Scala 2.12, Oozie 4.3, Zookeeper 3.4, J2EE, Python 3.7, JQuery, NoSQL, MVC, Struts 2.5.17, Hive 2.3

Confidential - Mt Laurel, NJ

Sr. Big Data Engineer

Responsibilities:

As a Big Data Engineer developed Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, Shell Scripting, Hive.
Worked with Microsoft Azure Cloud services, Storage Accounts, Azure date storage and Azure Data Factory.
Used Agile Methodology of Data Warehouse development using Kanbanize.
Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
Integrated Oozie with Pig, Hive, and Sqoop and developed Oozie workflow for scheduling and orchestrating the ETL process within the Cloudera Hadoop.
Collaborated with other data modeling team members to ensure design consistency and integrity.
Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
Developed complete end to end Big-data processing in Hadoop eco system.
Developed customized classes for serialization and Deserialization in Hadoop.
Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
Worked closely with business analyst for requirement gathering and translating into technical documentation.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Created and worked Sqoop jobs with incremental load to populate Hive External tables.
Worked in MongoDB and UNIX environment to non-SQL data clean-up grouping and create the analysis reports.
Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
Created external tables pointing to HBase to access table with huge number of columns.
Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
Extensively used Erwin for developing data model using star schema methodologies.
Maintained MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases.
Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
Developed, planed and migrated servers, relational databases (SQL) and websites to Microsoft Azure.
Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
Generated multiple ad-hoc Python tools and scripts to facilitate map generation and data manipulation.

Environment: Big Data, Hadoop 3.0, Agile, Hive 2.3, HDFS, Oracle 12c, HBase 1.2, Flume 1.8, Pig 0.17, Oozie 4.3, SSIS, SSRS, SQL, PL/SQL, Cassandra 3.11, MongoDB, ETL, Sqoop

Confidential, Hillsboro, OR

Sr. Hadoop Developer

Responsibilities:

Extensively worked on Hadoop eco-systems including Hive, Spark Streaming with MapR distribution.
Implemented J2EE Design Patterns like DAO, Singleton, and Factory.
Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.
Developed the Java/J2EE based multi-threaded application, which is built on top of the struts framework.
Used Spring/MVC framework to enable the interactions between JSP/View layer and implemented different design patterns with J2EE and XML technology.
Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
Implemented the J2EE design patterns Data Access Object (DAO), Session Façade and Business Delegate.
Developed Nifi flows dealing with various kinds of data formats such as XML, JSON and Avro.
Implemented MapReduce jobs in HIVE by querying the available data.
Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Used Cloudera Manager for installation and management of Hadoop Cluster.
Collaborated with business users/product owners/developers to contribute to the analysis of functional requirements.
Implemented application using MVC architecture integrating Hibernate and spring frameworks.
Utilized various JavaScript and JQuery libraries Bootstrap, Ajax for form validation and other interactive features.
Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
Integrated Kafka-Spark streaming for high efficiency throughput and reliability
Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.

Environment: Hadoop 3.0, Hive 2.1, J2EE, JDBC, Pig 0.16, HBase 1.1, Sqoop, NoSQL, Impala, Java, Spring, MVC, XML, Spark 1.9, PL/SQL, HDFS, JSON, Hibernate, Bootstrap, JQuery, JavaScript, Ajax

Confidential - Hartford, CT

Data Analyst/Data Engineer

Responsibilities:

Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
Designed the HBase schemes based on the requirements and HBase data migration and validation
Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
Worked on moving all log files generated from various sources to HDFS for further processing
Wrote Hive with Scala scripts to analyze data according to business requirement.
Generate metadata, create Talend jobs, mappings to load data warehouse, Data Lake.
Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Translated business requirements into working logical and physical data models for OLTP & OLAP systems.
Optimized the performance of queries with modification in T-SQL queries, established joins and created clustered indexes.
Created HBase tables to store various data formats of data coming from different sources.
Created the system for single truth of source on Hadoop file system (HDFS), while enabling transparent data movement and access at various layers.
Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Developed SAS macros for data cleaning, reporting and to support routing processing.
Embed SQL queries in Excel and used Excel functions to calculate parameters like standard deviation,
Performed Data Analysis, Statistical Analysis, Generated Reports and Listing using SAS/SQL, SAS/ACCESS and SAS/EXCEL, Pivot Tables and Graphs

Environment: Erwin 9.5, SAS, SQL, HBase, Scala, T-SQL, AWS, Oozie, Hive 1.9, HDFS, PL/SQL, Excel.

We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

Merrimack, NH

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship