We provide IT Staff Augmentation Services!

Sr. Big Data Architect Resume

3.00/5 (Submit Your Rating)

Houston, TX

SUMMARY

  • Over 9 years of experience in Development, Design, Integration, and Presentation with Java along with 4 years of Big Data /Hadoop experience in hadoop ecosystem such as Hive, Pig, Flume, Sqoop, Zookeeper, Hbase, SPARK, Kafka, Python and AWS.
  • Experienced in developing web based GUIs using JavaScript, JSP, HTML, JQuery, XMLand CSS. servers such as, JBoss, and Apache Tomcat 6.0/7.0/8/0.
  • Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra.
  • Technologies extensively worked on during my tenure in Software Development are Struts, Spring, CXF Rest API, Webservices, SOAP, XML, JMS, JSP, JNDI, Apache, Tomcat, JDBC and various Databases like Oracle, and Microsoft SQL server.
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Expertise in architecting Big data solutions using Data ingestion, Data Storage
  • Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Strong Experience in Front End Technologies like JSP, HTML5, JQuery, JavaScript, CSS3.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Experienced with Akka building high performance and reliable distributed applications in Java and Scala.
  • Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
  • Good experience in Shell programming.
  • Knowledge in configuration and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Experienced to develop enterprise applications with J2EE/MVC architecture with application servers and Web.
  • Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools/technologies.
  • Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using sqoop, flume, kafka, Spark Streaming, Spark Mllib, Cassandra.
  • Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
  • Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
  • Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Experienced in application development using Java, J2EE, JDBC, spring, Junit.
  • Experienced in using various Hadoop infrastructures such as Map Reduce, Hive, Sqoop, and Oozie.
  • Expert in AmazonEMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloud watch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
  • Experienced in testing data in HDFS and Hive for each transaction of data.
  • Strong Experience in working with Databases like Oracle 12C/11g/10g/9i, DB2, SQL Server and MySQL and proficiency in writing complex SQL queries.
  • Experienced in using database tools like SQL Navigator, TOAD.

TECHNICAL SKILLS

Java/J2EE Technologies: JSP, Servlets, JQuery, JDBC, Java Script.

Hadoop/Big Data: HDFS, Hive, Pig, HBase, Map Reduce, Zookeeper, Scala, Akka, Kafka, Storm, Mongo DB, Sqoop, Oozie, FlumeLanguages Java, J2EE, HQL, R, Python, XPath, Spark, PL/SQL, Pig Latin.

Operating Systems: Linux, Windows, UNIX, Ubuntu, Centos, Sun Solaris.

No SQL Databases: Mongo DB, Dynamo DB, CassandraWeb Technologies HTML, XML, DHTML, XHTML, CSS, XSLT.

Web/Application servers: Apache Tomcat6.0/7.0/8.0, JBoss.

Frameworks: MVC, Struts, Spring, Hibernate.

Databases: Microsoft Access, MS SQL, Oracle 12c/11g/10g/9i.

AWS: AWS, EC2, S3, SQS.

PROFESSIONAL EXPERIENCE

Confidential, Houston, TX

Sr. Big Data Architect

Responsibilities:

  • Involved in Design and Architecting of Big Data solutions using Hadoop Eco System.
  • Unified data lake architecture integrating various data sources on Hadoop architecture
  • EDW assessment using tools Attunity, Cloudera and Gluent
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Writing Scala code to run SPARK jobs in Hadoop HDFS cluster.
  • Designed and deployed full SDLC of AWS Hadoop cluster based on client's business need.
  • Experience on BI reporting with at Scale OLAP for Big Data.
  • Implementation of Big Data ecosystem (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud Architecture.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such asHadoop, Map Reduce Frameworks, HBase, Hive.
  • Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Utilized NOSQL database HBase for loading HIVE tables into HBase tables through Hive-HBase integration which was consumed by Data scientist team.
  • Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
  • Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop).
  • Identify query duplication, complexity and dependency to minimize migration efforts
  • Performed data profiling and transformation on the raw data using Pig, Python, and Java.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Involved in working of big data analysis using Pig and User defined functions (UDF).
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP) and MapR.
  • Experience in integrating oozie logs to kibana dashboard.
  • Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (cassandra).
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Developed Spark streaming application to pull data from cloud to Hive table.
  • Used Spark SQL to process the huge amount of structured data.
  • Assigned name to each of the columns using case class option in Scala.
  • Implemented Spark GraphX application to analyze guest behavior for data science segments.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
  • Experience on BI reporting with at Scale OLAP for Big Data.
  • Responsible for importing log files from various sources into HDFS using Flume
  • Expert in performing business analytical scripts using Hive SQL.

Environment: Big Data, Spark, YARN, HIVE, Pig, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL.

Confidential, NYC, NY

Big Data Engineer

Responsibilities:

  • Worked with Hadoop Ecosystem components like HBase, Sqoop, ZooKeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Utilize AWS services with focus on big data Architect /analytics / enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
  • Experience in data cleansing and data mining.
  • Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
  • All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats.
  • Ingest data into Hadoop / Hive/HDFS from different data sources.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
  • Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
  • Worked on writing Perl scripts covering data feed handling, implementingmark logic, communicating with web-services through SOAP Lite module and WSDL.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Design of Redshift Data model, Redshift Performance improvements/analysis
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Performed File system management and monitoring on Hadoop log files.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Developed customized classes for serialization and Deserialization in Hadoop.

Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

Confidential, Bronx, NY

JAVA/Big Data Developer

Responsibilities:

  • Worked on Hadoop, Hive, Oozie, and MySQL customization for batch data platform setup.
  • Involved in the coding and integration of several business critical modules of CARE application using Java, spring, Hibernate and REST web services on Web Sphere application server.
  • Developed web components using JSP, Servlets, and JDBC.
  • Designed and developed Enterprise Eligibility business objects and domain objects with Object Relational Mapping framework such as Hibernate.
  • Developed the Web Based Rich Internet Application (RIA) using JAVA/J2EE (spring framework).
  • Participated in JAD meetings to gather the requirements and understand the End Users System.
  • Developed user interfaces using JSP, HTML, XML and JavaScript.
  • Generated XML Schemas and used XML Beans to parse XML files.
  • Modified the existing JSP pages using JSTL.
  • Developed web pages using JSPs and JSTL to help end user make online submission of rebates. Also used XML Beans for data mapping of XML into Java Objects.
  • Experience working with big data and real time/near real time analytics and big data platforms like Hadoop, Spark using programming languages like Scala and Java.
  • Developed and Implemented new UI's using Angular JS and HTML.
  • Involved in Big Data Project Implementation and Support.
  • Developed Spring Configuration for dependency injection by using Spring IOC, Spring Controllers.
  • Implementing Spring MVC and IOC methodologies.
  • Used the JNDI for Naming and directory services.
  • Involved in the coding and integration of several business critical modules of application using Java, Spring, Hibernate and REST web services on WebSphere application server.
  • Deliver Big Data Products including re-platforming Legacy Global Risk Management System with Big Data Technologies such as Hadoop, Hive and HBase.
  • Developed Restful web services using JAX-RS and used DELETE, PUT, POST, GET HTTP methods in spring 3.0 and OSGI integrated environment.
  • Used the light weight container of the Spring Framework to provide architectural flexibility for inversion of controller (IOC).
  • Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report on IBM WebSphere MQ messaging system.
  • Used Spring JDBC Dao as a data access technology to interact with the database.
  • Developed Unit and E2E test cases using Node JS.
  • Developed presentation layer using Java Server Faces (JSF) MVC framework.
  • Worked with NoSQL Mongo DB and heavily worked on Hive, Hbase and HDFS.

Environment: JSP 2.1, Hadoop 1x, Hive, Pig, HBASE, JSTL 1.2, Java, J2EE, Java SE 6, UML, Servlets 2.5, Spring MVC, Hibernate, JSON, Unix, JUnit, DB2, Oracle, Restful Web services, jQuery, AJAX, Angular.Js, JAXB, IRAD Web sphere Integration Developer, Web Sphere 7.0.

We'd love your feedback!