Big Data Architect Resume
Bellevue, WA
PROFESSIONAL SUMMARY:
- 15+ years of experience in the IT industry which includes 5+ years of experience in experience in BIGDATA (SPARK, HDFS, HIVE, PIG, SQOOP, KAFKA, CASSANDRA) and 10+ Years of experience in J2EE based technologies.
- Strong knowledge in the Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
- Strong experience and knowledge of Hadoop, HDFS, Map Reduce and Hadoop ecosystem components like Hive, Pig, Sqoop, NoSQL.
- Hands on experience in Amazon AWS and Google GCP
- Strong experience in Java/ J2EE/ Web Technologies like HTML, Java Script, XML, CSS, JDBC, Servlets, JSP, SOAP, WSDL,Struts.
- Hands on experience in writing MAPREDUCE programs in JAVA.
- Worked on Multi Clustered environment and setting up Cloudera and HortonWorks Hadoop echo System.
- Implemented workflows in Oozie using Sqoop, MapReduce, Hive and other Java and Shell actions.
- Expertise in developing solutions around SQL and NoSQL databases like Cassandra.
- In - depth understanding of Data Structure and Algorithms.
- Hands on experience in application development using Java, RDBMS, and UNIX shell scripting.
- Validating that there is no data loss by comparing HIVE table data against RDBMS data.
- Experience in Java, Scala programming / Streaming.
- Strong experience in Apache Spark components like Spark streaming and Spark SQL.
- Experience and good knowledge in streaming real time data using Apache Kafka and Apache Spark.
- Experience in Java, CSS, XML, JUnit.
- Designing and deploying Service Oriented Architecture (SOA).
- Experience working individually and in cross functional team environment.
- Ability to work independently to help drive solutions in fast paced/dynamic work environments.
- Strong team building, conflict management, time management and meeting management skills.
TECHNICAL SKILLS:
BIGDATA / HADOOP Technologies: Hadoop, HDFS, Pig, Hive, Flume, Sqoop, Oozie
Streaming Technologies: Apache Spark 2.X
Queuing / MessagingTechnologies: Kafka 1.X
Java/J2EETechnologies: Java, Servlets, JSP, JDBC, XML, AJAX, SOAP, WSDL
Web Technologies: HTML, XML, JavaScript, CSS, JQuery
IDEs: Eclipse, Visual Studio, JETBrains IntelliJ
Languages: Scala 2.X, JAVA, .NET
NoSQL: Cassandra
Databases: Oracle, DB2, MS - SQL Server, NoSQL, MySQL, MS Access
Operating Systems: Windows, UNIX, Linux, Mac OS, SUSSE Linux
PROFESSIONAL EXPERIENCE:
Confidential, Bellevue, WA
Big Data Architect
Responsibilities:
- Worked as an Architect for the project in suggesting the tools and Architecture suggestion.
- Designed the Data pipeline from DB2 To HDFS using KAFKA and Spark Streaming.
- Developed Sqoop scripts for initial loading of data
- Developed KAFKA consumers and created Topic for the customer
- Developed Spark streaming
- Developed Spark transformations of Data
- Minimized the joins of TeraData into few in Spark
- Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
Technologies: HDFS, Hive, Spark 2.3.0, Scala 2.11.8, Shell Scripting, Kafka 1.0, SUSSE Linux, DB2, TeraData, Hortonworks eco system and Data modeling.
Confidential, Foster City, CA
Big Data Engineer
Responsibilities:
- Understanding business needs, analyzing functional specifications and map those to design document.
- Designing of the system
- Spark Jobs for pulling the EDI files into HDFS
- Hive and Sqoop scripts to pull data from TeraData to HDFS
- Wrote Spark job to stream data from TeraData to HDFS
- Performance improvement for Hive scripts
- Data correctness checking.
- Written Pig and Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Worked on Data modeling during application software design
Technologies: HDFS, Hive, Spark, Java, Shell Scripting, Linux, SQOOP, Flume, DB2, Oracle, Hortonworks eco system and Data modeling.
Confidential, UK
Spark Developer
Responsibilities:
- Designed and developed Hadoop system to analyze the SIEM (Security Information and Event Management) data using MapReduce, HBase, Hive, Sqoop and Flume.
- Working on a live 30 nodes Hadoop cluster running CDH5.4
- Developed Spark Hadoop Migration Solution approach
- Designed and developed POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Moving data from HDFS to RDBMS and vice-versa using SQOOP
- Involved in Design and Development of technical specifications using Hadoop technology
- Worked extensively with Sqoop for importing data from RDBMS into HDFS and Hive.
- Wrote Sqoop scripts to import the data to HIVE in incremental fashion
- Created dynamic hive partitions based on requirements.
- Performed complex Joins on the tables in Hive.
- Used apache sentry to create an instance for other users to access hive s data.
- Monitored and validated the data coming from RDBMS to HIVE.
- Developed Oozie Workflows for daily incremental loads, which gets data from RDBMS and then imported into hive tables. writing spark programs in scala as per the business requirement
- Created partitioned tables in Hive - Static & Dynamic.
- Developed consumption process using spark, scala.
- RDD Operations i,e transformations & actions, persistence (caching), custom accumulators, broadcast variables, optimizing broadcasts, converting between RDD types.
- Creating pair RDDs, transformations on pair RDDs, aggregations, grouping data, joins, sorting data.
- Database connectivity, Hive & No Sql (HBase) connectivity
- Initializing spark SQL, linking with Spark SQL schemaRDDs, Using spark SQL in applications, basic query example, User-defined functions, spark SQL UDFs, Hive UDFs, spark SQL performance, performance tuning.sources
Technologies:Hortonworks on RHEL, AWS S3, IBM Change Data Capture (CDC), Apache Kafka, Trident, HBase, Sqoop,ApacheSolr, Sqoop, Tableau, PL/SQL, Data masking, Data Modeling, Scala,Talend, ABINITIO
Confidential, Jersey City, NJ
Big Data/Hadoop Developer
Responsibilities:
- Worked in the BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
- Assess existing and available data warehousing technologies and methods to ensure our Data warehouse/BI architecture meets the needs of the business unit and enterprise and allows for business growth.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
- Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Developed Hive queries for the analysts.
Technologies: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Horton works, Cloudera, MapR, Data Stax, IBM Data Stage (Designer, Director, Administrator), PL/SQL, SQL*PLUS, Toad, Windows, UNIX, Shell Scripting.
Confidential, NYC, New York
Architect and Program Manager
Responsibilities:
- Designed the whole migration approach and implemented it with the help of team.
- Effectively interacted with team members and business users for requirements gathering.
- Customer facing and status explanation to the top management on weekly and monthly basis.
- Team recruitment and management.
- Designed the 24/7 support model for the customer
- Coding of the important migration modules using MVC frame work.
- Used Spring MVC to decouple business logic and view components.
- Involved in the integration of spring for implementing Dependency Injection (DI/IOC).
- Developed the Business Logic using Plain Old Java Objects (POJOs).
- Developed Graphical User Interfaces using HTML and JSP’s for user interaction.
- Created ExtJS pages, used JavaScript for client-side validations, and AJAX to create interactive front-end GUI.
- Used JSON for data exchange between application modules along with XML.
- Created set of classes using DAO pattern to decouple the business logic and data.
- Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs to implement various features and enhancements.
- Consumed Web Services for transferring data between different applications.
- Involved in coding, maintaining, and administering Servlets, and JSP components to be deployed on a WebSphere Application server application server.
- Integrated Documentum with Mainframes and SAP
- Built scripts using Ant to build the J2EE application.
- Developed the application using RAD as the IDE and used its features for editing, debugging, compiling, formatting, build automation and version control (CVS).
- Wrote test cases in JUnit for unit testing of classes and implemented the logging using Log4j.
- Used CVS version control to track and maintain the different version of the application.
Technologies: Java, J2EE, JSP,JSTL, Servlet, Agile Methodology, Struts, MVC, Tomcat/JBoss, XML, HTML, CSS, DHTML, DOM, Hibernate, Struts, SOAP, JavaScript, Multithreading, Oracle 9i, Junit, Web services, PLSQL, JDBC, ANT, Rational Rose, Solaris and windows, D2, Documentum 6.7 Suite of products, SAP, Oralc Apps ERP system, IBM RAD, Tomcat, Oracle 10g, Websphere Portal Server.
Confidential
Architect
Responsibilities:
- Knowledge Acquisition of Design aspects from the vendor Architect
- In-depth study of the system
- Change Control process monitoring, Decision making on change controls
- Supporting the business units for all their Enterprise Content Management needs
- Design of modules, Server installation and integration
- Onsite offshore co-ordination, Supporting the business users on FDA submission
- Resource management, Resource planning, Resource training
- Designed the project which includes, High level design, Low level design, UML modeling using Rational Rose.
- Risk study, Risk mitigation plan
Technologies: Java, J2EE, Tomcat, Oracle 10g, Weblogic Portal Server, Documentum 5.3 Suite of products, eCTD Server, ISI Publisher, CoreDossier, JSP, VB 6.0, XML, Visual Studio 2005. ITIL Process, FirstDoc
