We provide IT Staff Augmentation Services!

Sr. Big Data Architect/lead Resume

5.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY:

  • Above 9+ years of experience in IT industry, including big data environment, Hadoop ecosystem and Design, Developing, Maintenance of various applications.
  • Experienced in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL).
  • Expertise in core Java, JDBC and proficient in using Java API's for application development and experience includes development of web based applications using Core Java, JDBC, Java Servlets, JSP, Struts Framework, Hibernate, HTML, JavaScript, XML and Oracle.
  • Good Knowledge and experience in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
  • Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL.
  • Expertise in Java Script, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls and good working experience in Application and web Servers like JBoss and Apache Tomcat.
  • Experienced in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
  • Good experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions and leveraged and integrated Google Cloud Storage and Big Query applications, which connected to Tableau for end user web - based dashboards and reports.
  • Excellent hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Experienced in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools and experience in installation, configuration, supporting and managing Hadoop clusters.
  • Strong hands on experience with AWS services, including but not limited to EMR, S3, EC2, route 53, RDS, ELB, Dynamo DB, Cloud Formation, etc.
  • Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Scala, Pig, Impala, Sqoop, Oozie, Flume, Storm, big data technologies.
  • Worked on Spark SQL, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines.
  • Very good experience and knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop.
  • Excellent Knowledge in understanding Big Data infrastructure, distributed file systems -HDFS, parallel processing - Map Reduce framework.
  • Extensive knowledge in working with IDE Tools such as My Eclipse, RAD, IntelliJ, Netbeans.
  • Expert in Amazon EMR, Spark, Kinesis, S3, ECS, Elastic Cache, Dynamo DB, Redshift.
  • Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4 & CDH5 clusters.
  • Experience in working with different data sources like Flat files, XML files and Databases.
  • Experience in database design, entity relationships, database analysis, programming SQL, stored procedures PL/ SQL, packages and triggers in Oracle and experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Have very good knowledge on Splice Machine.

TECHNICAL SKILLS:

Big Data Ecosystem: MapReduce, HDFS, HIVE, Pig, Sqoop, Flume, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks

Databases: Oracle 12c/11g, MySQL, MS-SQL, Teradata, HBase, MongoDB, Cassandra.

Version Control: GIT, GitLab, SVN

Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS

NoSQL Databases:: HBase, Cassandra and MongoDB

Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.

Methodologies: Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC (Software Testing Life cycle) & UML, Design Patterns (Core Java and J2EE)

Web Technologies: JavaScript, CSS, HTML, JASON and JSP.

Operating Systems: Windows, UNIX/Linux and Mac OS.

Build Management Tools: Maven, Ant.

IDE & Command line tools: Eclipse, IntelliJ, Toad and Netbeans.

PROFESSIONAL EXPERIENCE:

Sr. Big Data Architect/Lead

Confidential, Dallas TX

Responsibilities:

  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive.
  • Managed and lead the development effort with the help of a diverse internal and overseas group and design/ architected and implemented complex projects dealing with the considerable data size (GB/ PB) and with high complexity.
  • Designed and deployed full SDLC of AWS Hadoop cluster based on client's business need and involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
  • Performed data profiling and transformation on the raw data using Pig, Python, and Java and developed predictive analytic using Apache Spark Scala APIs.
  • Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (Cassandra) and responsible for importing log files from various sources into HDFS using Flume
  • Analyzed data using HiveQL to generate payer by reports for transmission to payer's form payment summaries.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Exploring DAG's, their dependencies and logs using AirFlow pipelines for automation and use Apache Airflow to schedule and run the airflow dags to execute code.
  • Involved in working of big data analysis using Pig and User defined functions (UDF) and created Hive External tables and loaded the data into tables and query data using HQL.
  • Implemented Spark GraphX application to analyze guest behavior for data science segments.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Designed and developed UI screens using Struts, DOJO, JavaScript, JSP, HTML, DOM, CSS, and AJAX.
  • Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with .csv, JSON, parquet and HDFS files.
  • Developed HiveQL scripts for performing transformation logic and also loading the data from staging zone to landing zone and Semantic zone.
  • Maintain and work with our data pipeline that transfers and processes several terabytes of data using Spark, Scala, Python, Apache Kafka, Pig/ Hive & Impala
  • Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability and worked on Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data on a timely manner.
  • Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Involved in scheduling Airflow workflow engine to run multiple Hive and pig jobs using python.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Responsible for Design EDW Application Solutions & deployment, optimizing processes, definition and implementation of best practice

Environment: Big Data, Spark, YARN, HIVE, Pig, JavaScript, JSP, HTML, Ajax, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, ETL, AWS S3, AWS Glue, Oozie, Zookeeper, SQL, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL.

Sr. Big Data/Hadoop Architect

Confidential, Mentor OH

Responsibilities:

  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Developed Sqoop scripts for the extractions of data from various RDBMS databases into HDFS.
  • Developed scripts to automate the workflow of various processes using python and shell scripting.
  • Installed and configured Hadoop Ecosystem like Hive, Oozie, Sqoop by which implemented using Cloudera Hadoop cluster for helping with performance tuning and monitoring.
  • Collected and aggregate large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed data pipeline using Pig and Hive from Teradata, DB2 data sources. These pipelines had customized UDF'S to extend the ETL functionality.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Wrote Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive and used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Developed Map-Reduce programs using java and python to parse the raw data and store the refined data in Hive.
  • Used AWS Cloud and On-Premise environments with Infrastructure Provisioning/ Configuration.
  • Worked on writing Perl scripts covering data feed handling, implementing mark logic, communicating with web services through SOAP Lite module and WSDL.
  • Used UDF's to implement business logic in Hadoop by using Hive to read, write and query the Hadoop data in HBase.
  • Used Oozie workflow engine to run multiple Hive and Pig Scripts with the help of Kafka for the real-time processing of data to navigate through data sets in the HDFS storage by loading Log File data directly into HDFS using Flume.
  • Developed an end-to-end workflow to build a real time dashboard using Kibana, Elastic Search, Hive and Flume.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Developed Python MapReduce programs for log analysis and Designed Algorithm for finding the fake review by using python.
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop and Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS and AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce and developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Using Oozie for designing workflows and scheduling various jobs in the Hadoop ecosystem.
  • Developed Map Reduce programs in java for applying business rules on the data and optimizing them using various compression formats and combiners.
  • Using SparkSQL to create data frames by loading JSON data and analyzing it and developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.

Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

Sr. Big Data/Hadoop Developer

Confidential - Tampa, FL

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop and designed the projects using MVC architecture providing multiple views using the same model and thereby providing efficient modularity and scalability.
  • Custom Talend jobs to ingest and distribute data in Cloudera Hadoop ecosystem and improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN using Scala.
  • Implemented Spark Core in Scala to process data in memory and performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
  • Used Hadoop Pig, Hive and Map Reduce for analyzing the data to help by extracting data sets for meaningful information.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Handled importing of data from various data sources, performed transformations using MapReduce, Spark and loaded data into HDFS.
  • Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as merging many small files into a handful of very large, compressed files using pig pipelines in the data preparation stage.
  • Used Pig in three distinct workloads like pipelines, iterative processing and research and used Pig UDF's in Python, Java code and uses sampling of large data sets.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume and process the files.
  • Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers and created PIG Latin scripting and Sqoop Scripting.
  • Involved in transforming data from legacy tables to HDFS, and HBASE tables using Sqoop and implemented exception tracking logic using Pig scripts.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it and test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA and scheduled map reduce jobs in production environment using Oozie scheduler.
  • Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Implemented GUI screens for viewing using Servlets, JSP, Tag Libraries, JSTL, JavaBeans, HTML, JavaScript and Struts framework using MVC design pattern.
  • Build, configured and deployed Web components on Web Logic application and application built on Java Financial platform, which is an integration of several technologies like Struts and Spring Web Flow.
  • Used spring framework modules like Core container module, Application context module, Spring AOP module, Spring ORM and Spring MVC module.
  • Developed the presentation layer using Model View Architecture implemented by Spring MVC.
  • Performed Unit testing using JUnit and used SVN as version control tools to maintain the code repository.

Environment: Hadoop, Map Reduce, Spark, shark, Kafka, HDFS, Hive, Pig, Oozie, Core Java, Eclipse, HBase, Flume, Cloudera, Oracle10g, UNIX Shell Scripting, Scala, MongoDB, HBase, Cassandra, Python.

Sr. Java/J2EE Developer

Confidential, Harrisburg, PA

Responsibilities:

  • Involved in a full life cycle Object Oriented application development - Object Modeling, Database Mapping, GUI Design.
  • Developed the J2EE application based on the Service Oriented Architecture and used Design Patterns like Singleton, Factory, Session Facade and DAO.
  • Developed using new features of Java Annotations, Generics, enhanced for loop and Enums.
  • Developed Use Case diagrams, Class diagrams and Sequence diagrams to express the detail design.
  • Worked with EJB (Session and Entity) to implement the business logic to handle various interactions with the database.
  • Skilled in using collections in Python for manipulating and looping through different user defined objects.
  • Implemented a high-performance, highly modular, load-balancing broker in C with ZeroMQ and Redis.
  • Used spring and Hibernate for implementing IOC, AOP and ORM for back end tiers and created and injected spring services, spring controllers and DAOs to achieve dependency injection and to wire objects of business classes.
  • Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed in Python and Bash.
  • Part of team implementing REST API's in Python using micro-framework like Flask with
  • Used Spring Inheritance to develop beans from already developed parent beans and used DAO pattern to fetch data from database using Hibernate to carry out various database.
  • Used SOAP Lite module to communicate with different web-services based on given WSDL.
  • Worked on Evaluating, comparing different tools for test data management with Hadoop.
  • Helped and directed testing team to get up to speed on Hadoop Application testing and used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
  • Modified the Spring Controllers and Services classes so as to support the introduction of spring framework.
  • Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end and developed various generic JavaScript functions used for validations.
  • Developed screens using HTML5, CSS, jQuery, JSP, JavaScript, AJAX and ExtJS and used Aptana Studio and Sublime to develop and debug application code.
  • Skilled in using collections in Python for manipulating and looping through different user defined objects.
  • Used Rational Application Developer (RAD) which is based on Eclipse, to develop and debug application code and created user-friendly GUI interface and Web pages using HTML, AngularJS, JQuery and JavaScript.
  • Used Log4j utility to generate run-time logs and Wrote SAX and DOM XML parsers and used SOAP for sending and getting data from the external interface.
  • Deployed business components into WebSphere Application Server and developed Functional Requirement Document based on users' requirement.

Environment: Core Java, J2EE, JDK 1.6, Python, spring 3.0, Hibernate 3.2, Tiles, AJAX, JSP 2.1, Eclipse 3.6, IBM WebSphere7.0, XML, XSLT, SAX, DOM Parser, HTML, UML, Oracle10g, PL/ SQL, JUnit.

Java Developer

Confidential

Responsibilities:

  • Implemented Spring MVC architecture and Spring Bean Factory using IOC, AOP concepts.
  • Gathered the requirements and designed the application flow for the application.
  • Used HTML, JavaScript, JSF 2.0, AJAX and JSP to create the User Interface. • Involved in writing Maven for building and configuring the application.
  • Developed Action classes for the system as a feature of Struts and performed both Server side and Client side Validations.
  • Developed EJB component to implement business logic using Session and Message Bean.
  • Developed the code using Core Java Concepts Spring Framework, JSP, Hibernate 3.0, JavaScript, XML and HTML.
  • Used Spring Framework to integrate with Struts web framework, Hibernate.
  • Extensively worked with Hibernate to connect to database for data persistence and integrated Activate Catalog to get parts using JMS.
  • Used Log4J log both User Interface and Domain Level Messages.
  • Extensively worked with Struts for middle tier development with Hibernate as ORM and Spring IOC for Dependency Injection for the application based on MVC design paradigm.
  • Created struts-config.xml file to manage with the page flow and developed html views with HTML, CSS, and Java Script.
  • Performed Unit testing for modules using Junit and played an active role in preparing documentation for future reference and upgrades.
  • Implemented the front end using JSP, HTML, CSS and JavaScript, JQuery, AJAX for dynamic web content.
  • Worked in an Agile Environment used Scrum as the methodology wherein I was responsible for delivering potentially shippable product increments at the end of each Sprint.
  • Involved in Scrum meetings that allow clusters of teams to discuss their work, focusing especially on areas of overlap and integration.

Environment: Java 1.4, JSP, Servlets, Java Script,, HTML 5, AJAX, JDBC, JMS, EJB, Struts 2.0, Spring 2.0, Hibernate 2.0, Eclipse 3.x, WebLogic9, Oracle 9i, Junit, Log4j

We'd love your feedback!