Big Data Engineer Resume
Foster City, CA
PROFESSIONAL SUMMARY:
- 15+ yearsofexperience in the IT industry which includes 5+ years of experience in experience in BIGDATA (SPARK, HDFS, HIVE, PIG, SQOOP, FLUME and others) and 10+ Years of experience in J2EEbased technologies.
- Strong knowledge in the Software Development Life Cycle (SDLC)and expertise in detailed design documentation.
- Strong experience and knowledge of Hadoop, HDFS, Map Reduce and Hadoop ecosystem components like Hive, Pig, Sqoop, Oozie, NoSQL.
- Hands on experience in Amazon AWS (EC2, VPC, S3 and EMR).
- Hands on experience in installing, configuring Hadoop clusters and good knowledge of the ecosystem components like MapReduce, HDFS, HBase, Zoo Keeper, Oozie, Hive, HDP, Sqoop, PIG, Flume.
- Strong experience in Java/ J2EE/ Web Technologies like HTML, Java Script, XML, CSS, JDBC, Servlets, JSP, SOAP, WSDL,Struts.
- Hands on experience in writing MAPREDUCE programs in JAVA.
- Worked on Multi Clustered environment and setting up Cloudera and HortonWorks Hadoop echo System.
- Implemented workflows in Oozie using Sqoop, MapReduce, Hive and other Java and Shell actions.
- Expertise in developing solutions around SQL and NoSQL databases like HBASE.
- Implemented user defined functions in PIG and HIVE.
- Strong experience on Hadoop distributions like Cloudera (CDH4, CDH5) and Horton works.
- In - depth understanding of Data Structure and Algorithms.
- Hands on experience in application development using Java, RDBMS, and UNIX shell scripting.
- Validating that there is no data loss by comparing HIVE table data against RDBMS data.
- Experience in Java, Scala programming / Streaming.
- Strong experience in Apache Spark components like Spark streaming and Spark SQL
- Experience and good knowledge in streaming real time data using Apache Kafka and Apache Spark
- Experience in Java, CSS, XML, JUnit
- Good Knowledge in MAPR
- Hands on experience on Apache Solr(Open source search platform).
- Good knowledge ANT and AWS.
- Very good experience in SQL.
- Designing and deploying Service Oriented Architecture (SOA) thru web services.
- Experience working individually and in team environment.
- Ability to work independently to help drive solutions in fast paced/dynamic work environments.
- Strong team building, conflict management, time management and meeting management skills.
TECHNICAL SKILLS:
BIGDATA / HADOOP Technologies: Hadoop, HDFS, Pig, Hive, Flume, Apache Solr, Sqoop, Oozie and Mahout
Streaming Technologies: Apache Spark 2.0
Queuing Technologies: Kafka
Java/J2EETechnologies: Java, Servlets, JSP, JDBC, XML, AJAX, SOAP, WSDL
Web Technologies: HTML, XML, JavaScript, CSS, JQuery
IDEs: Eclipse, Visual Studio,IntelliJE
Languages: Scala 2.X, JAVA, .NET
NoSQL: HBase, Cassandra
Databases: Oracle, DB2, MS - SQL Server, NoSQL, MySQL, MS Access
Operating Systems: Windows, UNIX, Linux, Mac OS
PROFESSIONAL EXPERIENCE:
Confidential, Foster City, CA
Big Data Engineer
Responsibilities:
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Zookeeper, SQOOP, flume, Kafka, Spark, Cassandra with Horton works and Cloudera
- Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Understanding business needs, analysing functional specifications and map those to develop and designing MapReduce programs and algorithms.
- Written Pig and Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data. Also have hand on Experience on Pig and Hive User Define Functions (UFD).
- Execution of Hadoop ecosystem and Applications through Apache HUE.
- Optimizing Hadoop MapReduce code, Hive/Pig scripts for better scalability, reliability and performance.
- Developed the OOZIE workflows for the Application execution.
- Feasibility Analysis (For the deliverables) - Evaluating the feasibility of the requirements against complexity and time lines.
- Performing data migration from Legacy Databases RDBMS to HDFS using SQOOP.
- Writing Pig scripts for data processing.
- Implemented Hive tables and HQL Queries for the reports. Written and used complex data type in Hive. Storing and retrieved data using HQL in Hive. Developed Hive queries to analyze reducer output data.
- Highly involved in designing the next generation data architecture for the unstructured data
- Managed a 4-node Hadoop cluster for a client conducting a Hadoop proof of concept. The cluster had 12 cores and 3 TB of installed storage.
- Developed PIG Latin scripts to extract data from source system.
- Involved in Extracting, loading Data from Hive to Load an RDBMS using SQOOP.
- Integrate four square monitoring and production system with Kafka
- Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
- Worked on Data modeling during application software design
Technologies: HDFS, Map Reduce, Hive, Spark,Oozie, Java, PIG, Shell Scripting, Kafka, Linux, HUE, SQOOP, Flume, DB2, Oracle, Hortonworks eco system and Data modeling.
Confidential
Spark Developer
Responsibilities:
- Designed and developed Hadoop system to analyze the SIEM (Security Information and Event Management) data using MapReduce, HBase, Hive, Sqoop and Flume.
- Working on a live 30 nodes Hadoop cluster running CDH5.4
- Developed Spark Hadoop Migration Solution approach
- Designed and developed POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Moving data from HDFS to RDBMS and vice-versa using SQOOP
- Involved in Design and Development of technical specifications using Hadoop technology
- Worked extensively with Sqoop for importing data from RDBMS into HDFS and Hive.
- Wrote Sqoop scripts to import the data to HIVE in incremental fashion
- Created dynamic hive partitions based on requirements.
- Performed complex Joins on the tables in Hive.
- Used apache sentry to create an instance for other users to access hive s data.
- Monitored and validated the data coming from RDBMS to HIVE.
- Developed Oozie Workflows for daily incremental loads, which gets data from RDBMS and then imported into hive tables. writing spark programs in scala as per the business requirement
- Created partitioned tables in Hive - Static & Dynamic.
- Developed consumption process using spark, scala.
- RDD Operations i,e transformations & actions, persistence (caching), custom accumulators, broadcast variables, optimizing broadcasts, converting between RDD types.
- Creating pair RDDs, transformations on pair RDDs, aggregations, grouping data, joins, sorting data.
- Database connectivity, Hive & No Sql (HBase) connectivity
- Initializing spark SQL, linking with Spark SQL schemaRDDs, Using spark SQL in applications, basic query example, User-defined functions, spark SQL UDFs, Hive UDFs, spark SQL performance, performance tuning.sources
Technologies:Hortonworks on RHEL, AWS S3, IBM Change Data Capture (CDC), Apache Kafka, Trident, HBase, Sqoop,ApacheSolr, Sqoop, Tableau, PL/SQL, Data masking, Data Modeling, Scala, python, Talend, ABINITIO
Confidential, Jersey City, NJ
Big Data/Hadoop Developer
Responsibilities:
- Worked in the BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
- Assess existing and available data warehousing technologies and methods to ensure our Data warehouse/BI architecture meets the needs of the business unit and enterprise and allows for business growth.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
- Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Developed Hive queries for the analysts.
Technologies: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Horton works, Cloudera, MapR, Data Stax, IBM Data Stage (Designer, Director, Administrator), PL/SQL, SQL*PLUS, Toad, Windows, UNIX, Shell Scripting.
Confidential, NYC, New York
Architect and Program manager
Responsibilities:
- Designed the whole migration approach and implemented it with the help of team.
- Effectively interacted with team members and business users for requirements gathering.
- Customer facing and status explanation to the top management on weekly and monthly basis.
- Team recruitment and management.
- Designed the 24/7 support model for the customer
- Coding of the important migration modules using MVC frame work.
- Used Spring MVC to decouple business logic and view components.
- Involved in the integration of spring for implementing Dependency Injection (DI/IOC).
- Developed the Business Logic using Plain Old Java Objects (POJOs).
- Developed Graphical User Interfaces using HTML and JSP’s for user interaction.
- Created ExtJS pages, used JavaScript for client-side validations, and AJAX to create interactive front-end GUI.
- Used JSON for data exchange between application modules along with XML.
- Created set of classes using DAO pattern to decouple the business logic and data.
- Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs to implement various features and enhancements.
- Consumed Web Services for transferring data between different applications.
- Involved in coding, maintaining, and administering Servlets, and JSP components to be deployed on a WebSphere Application server application server.
- Integrated Documentum with Mainframes and SAP
- Built scripts using Ant to build the J2EE application.
- Developed the application using RAD as the IDE and used its features for editing, debugging, compiling, formatting, build automation and version control (CVS).
- Wrote test cases in JUnit for unit testing of classes and implemented the logging using Log4j.
- Used CVS version control to track and maintain the different version of the application.
Technologies: Java, J2EE, JSP,JSTL, Servlet, Agile Methodology, Struts, MVC, Tomcat/JBoss, XML, HTML, CSS, DHTML, DOM, Hibernate, Struts, SOAP, JavaScript, Multithreading, Oracle 9i, Junit, Web services, PLSQL, JDBC, ANT, Rational Rose, Solaris and windows, D2, Documentum 6.7 Suite of products, SAP, Oralc Apps ERP system, IBM RAD, Tomcat, Oracle 10g, Websphere Portal Server.
Confidential
Architect
Responsibilities:
- Knowledge Acquisition of Design aspects from the vendor Architect
- In-depth study of the system
- Change Control process monitoring, Decision making on change controls
- Supporting the business units for all their Enterprise Content Management needs
- Design of modules, Server installation and integration
- Onsite offshore co-ordination, Supporting the business users on FDA submission
- Resource management, Resource planning, Resource training
- Designed the project which includes, High level design, Low level design, UML modeling using Rational Rose.
- Risk study, Risk mitigation plan
Technologies: Java, J2EE, Tomcat, Oracle 10g, Weblogic Portal Server, Documentum 5.3 Suite of products, eCTD Server, ISI Publisher, CoreDossier, JSP, VB 6.0, XML, Visual Studio 2005. ITIL Process, FirstDoc
