Big Data Engineer/Application Architect Resume New York, NY - Hire IT People

PROFESSIONAL SUMMARY:

Over 9 years of working experience as a Big Data developer and designed and developed various applications on Big Data and python open - source technologies.
Strong development skills in Hadoop, HDFS, Map Reduce, Hive, Sqoop, HBase with solid understanding of Hadoop internals.
Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
Leveraged strong skills in developing applications involving Big Data technologies like Hadoop, Map Reduce, Yarn, Flume, Hive, Pig, Sqoop, H Base, Cloudera, Map R, Avro, Spark and Scala.
Extensively worked on major components of Hadoop Ecosystem like HDFS, HBase, Hive, Sqoop, PIG, and MapReduce.
Develop various scripts, numerous batch jobs to schedule various Hadoop programs.
Experience in analyzing data using Hive QL, and custom MapReduce programs in Java.
Hands on experience in importing and exporting data from different databases like Oracle, MySQL, into HDFS and Hive using Sqoop.
Good knowledge of NoSQL databases like Mongo DB, Cassandra and HBase.
Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, Pig and Splunk.
Experience in Programming and Development of java modules for an existing web portal based in Java using technologies like JSP, Servlets, JavaScript and HTML, Angular with MVC architecture.
Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data Analytics.
Experienced in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
Expertise in developing a simple web based application using J2EE technologies like JSP, Servlets, and JDBC.
Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
Work Extensively in Core Java, Struts, JSF, Spring, Hibernate, Servlets, JSP and Hands-on experience with PL/SQL, XML and SOAP.
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
Extensively worked on Linux based CentOS and strong hands-on experience on Linux commands.
Well versed working with Relational Database Management Systems as Oracle, MS SQL, MySQL Server
Hands on experience in advanced Big-Data technologies like Spark Ecosystem (Spark SQL, MLlib, SparkR and Spark Streaming), Kafka and Predictive analytics
Knowledge of the software Development Life Cycle (SDLC), Agile and Waterfall Methodologies.
Good knowledge of NoSQL databases such as HBase, MongoDB and Cassandra.
Experience in working with Eclipse IDE, Net Beans, and Rational Application Developer.
Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.

TECHNICAL SKILLS:

Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig, Hive 2.3, Sqoop 1.4, Apache Impala 2.1, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper, Databricks

Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, Azure Data Lake, Data Factory

Hadoop Distributions: Cloudera, Hortonworks, MapR

Programming Language: Java, Scala, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0, JSP, Servlets

Frameworks: Spring 5.0.5, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS

Web Technologies: HTML, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX

Databases: Oracle 12c/11g, SQL

Operating Systems: Linux, Unix, Windows 10/8/7

IDE and Tools: Eclipse 4.7, NetBeans 8.2, IntelliJ, Maven, Visual Basic Studio

NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB, Accumulo

Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere

SDLC Methodologies: Agile, Waterfall

Version Control: GIT, SVN, CVS, Codecommit

ADDITIONAL SKILLS:

J2EE, MVC, Java, Hibernate, JSON, JQuery, Eclipse, spring, JavaScript, Hadoop, Hive, MongoDB, Zookeeper, Spark, MapR, Pig, Sqoop, Agile, Azure, Jenkins, HDFS, NoSQL, HBase, Impala, MapReduce, YARN, Oozie, Oracle, PL/SQL, Nifi, XML, MYSQL

WORK EXPERIENCE:

Confidential, New York, NY

Big Data Engineer/Application Architect

Responsibilities:

Working on Hadoop eco-system over AWS cloud leveraging services like EMR, EC2, S3, CloudFormation, Lambda, Athena, Glue, DynamoDB and AWS cost-explorer etc..
Responsible for developing and managing the Analytical/Machine learning capabilities on AWS cloud across Amex.
Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce frameworks, Hive, Spark RDD
Involved in designing Hadoop architecture on AWS leveraging Service Catalog, CloudFormation, AWS EMR, DynamoDB and event processing using lambda functions.
Developed Data-governance tools using python and spark for securely placing enterprise data on AWS S3.
Designed and configured the Hadoop cluster using AWS EMR based on user behavior.
Responsible for Design EDW Application Solutions & Deployment, optimizing processes, definition and implementation of best practices.
Built an end-to-end automated tool which performs extracting zip files, load the data into respective hive tables in compressed format using shellscript, pyspark RDD and run QC.
Responsible for providing support to users across Amex on their data processing and modeling pipelines.
Provided several performance tuning and query optimization techniques to users for their hive and spark jobs.
Worked closely with Business users to gather requirements and troubleshooting issues on machine learning algorithms.
Performed data modeling using gradient boosting, Tree building algorithms such as AXGboost, GBDT, catboost etc...
Worked closely with Business vendors for enhancing BigData and machine learning platforms on AWS cloud as per business needs.
Performed several POCs on newly on-boarded AWS and BigData related services which help in enhancing the platform.
Managed and lead the development effort with the help of a diverse internal and overseas group.
Developed UI application using Angular and NVD3 to display network graphs of all the interlinked customers.
Participated in scrum and retrospective meetings and worked closely with scrum master to create features and stories in Jira.
Extensively worked on Excel for generating pivot tables and performing vlookup to join records from multiple Excel sheets.

Environment: AWS, EMR, EC2, S3, RDS, Glue, Athena, Service Catalog, Cloud Formation, Lambda Functions, Hadoop, Spark, Hive, Python, Pandas, XGBoost, Tensorflow, Angularjs, NVD3, Linux, HDFS, Spark-streaming

Confidential, San Antonio, TX

Sr. Big Data Developer

Responsibilities:

As a Sr. Big Data Developer worked on Hadoop eco-systems including Hive, HBase, Zookeeper, Spark Streaming with CDH distribution.
Developed Big Data solutions focused on pattern matching and predictive modeling.
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
Involved in Agile methodologies, daily scrum meetings, sprint planning.
Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
Load the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
Worked on MongoDB, HBase databases which differ from classic relational databases
Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
Integrated Kafka-Spark streaming for high efficiency throughput and reliability
Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.

Environment: Hadoop 3.0, Hive 2.3, CDH4, MongoDB, Python, pandas, Zookeeper, Spark, MapR, Pig 0.17, Sqoop, Agile, Azure, Jenkins, HDFS, NoSQL, HBase, Impala, MapReduce, YARN, Oozie, Oracle 12c, PL/SQL, Nifi, XML, JSON, MYSQL, Java

Confidential, Sunnyvale, CA

Big Data/Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
Developed Apache Spark applications by using spark for data processing from various streaming sources.
Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
Migrated MapReduce jobs to Spark jobs to achieve better performance.
Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame.
Worked on Kafka and REST API to collect and load the data on Hadoop file system also used Sqoop to load the data from relational databases.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
Involved in transforming data from legacy tables to HDFS and Hive tables using Sqoop.
Expertise in implementing Spark using and Spark SQL for faster testing and processing of data responsible to manage data from different sources Scala.
Implemented Apache Nifi flow topologies to perform cleansing operations before moving data into HDFS.
Involved in migrating MapReduce jobs into RDD (Resilient data distributions) and create Spark jobs for better performance.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Developed the batch scripts to fetch the data from ECS cloud and do required transformations in Scala using Spark framework.
Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
Performed transformations like event joins, filter bot traffic and some pre-aggregations using Pig.
Developed code in Java which creates mapping in Elastic Search even before data is indexed into.
Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Created and maintained various Shell and Python scripts for automating various processes and optimized MapReduce code, pig scripts and performance tuning and analysis.

Environment: Hadoop 3.0, Spark, Python, Hive 2.3, Agile, MapReduce, Kafka, HBase, HDFS, Sqoop, Scala, RDBMS, Oozie, Pig 0.17, Sqoop, Cassandra 3.11, NoSQL, Elastic Search, Java

Confidential

Java/J2EE Developer

Responsibilities:

Developed and utilized J2EE Services and JMS components for messaging communication in WebSphere Application Server.
Implemented MVC architecture by separating the business logic from the presentation layer.
Developed code using Java, J2EE, and spring also used Hibernate as an ORM tool for object relational mapping.
Used JNDI to perform lookup services for the various components of the system.
Created REST web services to send data in JSON format to different systems using spring boot.
Extensively used JQuery to provide dynamic User Interface and for the client side validations.
Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
Participated in object-oriented design, development and testing of REST APIs using Java.
Implemented Dependency Injection (IOC) feature of spring framework to inject dependency into objects.
Developed data access layer by integrating spring and Hibernate.
Used Hibernate framework for data persistence. Developed Hibernate objects for persisting data into the database.
Responsible for developing Hibernate configuration and mapping files for Persistent layer (Object and Relational Mapping).
Developed Object Oriented JavaScript code and responsible for client-side validations using JQuery
Extensively used Spring IOC features with spring framework for bean injection and transaction management.
Used Spring Framework as middle tier application framework, persistence strategy using spring support for Hibernate for integrating with database.
Involved in designing the application using MVC pattern
Created JDBC data source and connection pooling for the Application and hibernate mapping files when needed.
Consumed Restful Web Services to establish communication between different applications
Implemented Business Services using the Core java and spring.
Wrote object-oriented JavaScript for transparent presentation of both client- and server-side validation.

Environment: J2EE, MVC, Java, Hibernate, JSON, JQuery, Eclipse, spring, JavaScript

We provide IT Staff Augmentation Services!

Big Data Engineer/application Architect Resume

New York, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship