Sr. Big Data Engineer Resume
Edison, NJ
SUMMARY:
- Over 9+ years of experience in Analysis, Design, Development, Integration, Testing and maintenance of various applications using JAVA technologies along with BigData/Hadoopexperience.
- Excellent knowledge ofHadooparchitecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode,DataNode and MapReduce programming paradigm.
- Hands on experience in installing, configuring, and usingHadoopecosystem components likeHadoopMapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper and Flume.
- Attended and acquired knowledge on Couchbase.
- Extensive experience in developing Pig Latin Scripts for transformations and using Hive Query Language fordataanalytics.
- Hands on experience on electronic medical record management system (EMR).
- In depth knowledge of Spark concepts and experience with Spark inDataTransformation and Processing.
- Hands on experience working on NoSQL databases including Mongo,Cassandra,Hbase, and its integration withHadoopcluster.
- Experience inHadoopadministration activities such as installation and configuration of clusters using Apache, Cloudera and AWS.
- Good understanding and experience with SoftwareDevelopmentmethodologies like Agile and Waterfall.
- Extensive experienced Strong Database background with SQL, PL/SQL and database concepts such as StoredProcedures, Triggers, etc.
- Experience in writing shell scripts to dump teh SharedDatafrom MySQLserver to HDFS.
- Experience on integration of data from multiple data sources.
- Extensive experience in middle - tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSP, JSF,Struts, Spring, Hibernate, JDBC, EJB.
- Experience in using IDEs like Eclipse, VisualStudio and experience in DBMS like Oracle and MYSQL.
- Extensively worked on application servers like WebLogic, WebSphere application serverand JBoss.
- Experience in installation, configuration, supporting and managingHadoopClusters using Apache, Cloudera (CDH3, CDH4 and CDH 5) distributions and on Amazon web services (AWS).
- Experience in developing multiple POCs using Scala and their deployment on teh Yarn cluster, comparing teh performance of Spark with Cassandra and SQL.
- Hands on experience indatamining process, implementing complex business logic and optimizing teh query using HiveQL and controlling thedatadistribution by partitioning and bucketing techniques to enhance performance.
- Experience in Talend lookingdatain any file format, database, messaging queue, used hive and pig to performdataaggregation and transformation.
TECHNICAL SKILLS:
BigDataEcosystem: HDFS, HBase,HadoopMapReduce, Hive,Pig, Sqoop, Spark, Flume, Oozie, Cassandra, Storm, Impala, Control-M
Distributions: ApacheHadoop2.7.3, Cloudera CDH3, CDH4.
Languages: C, C++, Java, SQL/PLSQL
SDLC Methodologies: Agile, waterfall.SDLC
Database: Couchbase, Oracle 12c, DB2, MySQL, MS SQL server.
Web Tools: HTML, Java Script, XML, ODBC, JDBC, Hibernate, JSP,Servlets, Java, Struts, spring, Junit, Json and Avro.
IDE / Testing Tools: Eclipse, Visual Studio, NetBeans, Putty.
Operating System: Windows, UNIX, Linux.
Scripts: JavaScript, Shell Scripting.
Version Control: SVN, CVS, TFS.
PROFESSIONAL EXPERIENCE:
Confidential, Edison NJ
Sr. Big Data Engineer
Responsibilities:
- Implemented optimization and performance testing and tuning of Hive and Pig
- Deployed Spark application and java web services in pivotal cloud foundry.
- Involved in migrating MapReduce jobs into Spark jobs and used SparkSQL and Data frames API to load structured data into Spark clusters
- Used Macro, VB programming and pivot table in excel for query and reports.
- Involved in scheduling Oozie workflow engine to run multiple HiveQL, Sqoop and pig jobs. Maintained monthly dashboards to analyze sales trends using Pivot Table, Graphs & VLookup.
- Worked on NoSQL databases like Cassandra, HBase to store structured and unstructured data.
- Has a good knowledge on Amazon Web Services (AWS) such as EMR, and S3, EC2.
- Utilized Amazon Web Services (AWS) such as S3 to save teh executed results along with Cassandra.
- Created Pivot Tables and used narrative views and used dashboard prompts for creating intelligent dynamic dashboards.
- Designed and implemented Apache Spark - Streaming Applications
- Hadoop HDFS was used to archiving teh incoming data and also performing teh ETL.
- Worked on EMR.
- Formatting results, Filtering requests, and Showing results with Pivot, Chart, & Views in Presentation Services.
- Used Cassandra as teh NoSQL Database and acquired very good working experience with NoSQL databases.
- Used Oozie scheduler system to automate teh pipeline workflow
- Used Map Reduce and Spark for data cleaning and pre-processing, converting text data into suitable file formats and for performing joins and aggregations
- Optimizing of existing business logics in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's
- Migrated teh data from Oracle, IBM-ICM in to HDFS using Sqoop and imported various formats of flat files into HDFS.
- Developed and written Apache PIG scripts and HIVE scripts to process teh HDFS data.
- Experience in different NOSQL databases such as MongoDB, HBbase and Cassandra.
- Monitoring teh data flow between production servers via Couchbase environment
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
- Used Cassandra to work on JSON documented data.
- Integrating bulk data into Cassandra file system using MapReduce programs.
- Used Hive data warehouse tool to analyze teh data in HDFS and developed Hive queries.
- Implemented AWS and Azure-Omni for teh couchbase Load.
- Hands on experience in working with NoSQL databases like HBase, Cassandra.
- Cluster coordination services through Zookeeper.
- Assisted in design of hybrid-cloud management stack, primarily in Amazon Web Services (AWS) and AWS GovCloud, used to host client servers and services across tiered branches of architecture
- Used Impala to read, write and query teh Hadoop data in HDFS from Cassandra or HBase and configured Kafka to read and write messages from external programs.
- Working experience in developing a Data Pipeline that uses teh modern technology stack such as Kafka, Cassandra, and Spark Streaming.
Environment: Hadoop Cluster, MapReduce, HDFS, Hive, Java, HBase, Spark, PIG, Zookeeper, Sqoop, Flume, Cassandra, Couchbase, EMR, Storm, Oozie, Spark clusters, MySQL, Common, Yarn, Kafka, and NoSQL
Confidential, Malvern, PA
Sr. Big Data Engineer
Responsibilities:
- Worked on analyzingHadoopcluster using differentbigdataanalytic tools including Kafka, Pig, Hive and MapReduce.
- Configured Spark streaming to receive real timedatafrom teh Kafka and store teh streamdatato HDFS using Scale.
- Worked on implementing Spark using Scala and Sparksql for faster analyzing and processing ofdata.
- Handled in Importing and exportingdatainto HDFS and Hive using SQOOP and Kafka
- Involved in creating Hive tables, loading thedataand writing hive queries, which will run internally in map reduce.
- Worked on Designing and Developing ETLWorkflows using Java for processingdatain HDFS/Hbase using Oozie.
- Worked on importing teh unstructureddatainto teh HDFS using Flume.
- Knowledge of various ETL techniques and frameworks including Flume
- Wrote complex Hive queries and UDFs.
- Involved in developing Shell scripts to easy execution of all other scripts (Pig, Hive, and MapReduce) and move thedatafiles within and outside of HDFS.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Worked with Amazon Web Services (AWS)cloud infrastructure services and involved in ETL,DataIntegration and Migration.
- Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
- Used Flume extensively in gathering and moving logdatafiles from Application Servers to a central location inHadoopDistributed File System (HDFS).
- Worked with NoSQL databases like Mongo,Hbase, Cassandra in creating tables to load large sets of semi structureddata.
- Experience in deployment of Cassandra cluster in cloud, premises and data storage and their disaster recovery.
- Generated Java APIs for retrieval and analysis on No-SQL database such as Mongo, HBase and Cassandra
- Hands-on experience on creating RESTful API gateway on AWS with NoSQL.
- Worked on loadingdatafrom UNIXfile system to HDFS
- Analyzed large amounts ofdatasets to determine optimal way to aggregate and report on it.
Environment: Hadoop, HDFS, MapReduce, Hive Sqoop, Hbase, Apache Spark, Oozie Scheduler, Java, UNIX Shell Scripts, Kafka, Git, Maven, PLSQL, Python, Scala, Cloudera
Confidential, NYC, NY
Sr. Big Data Engineer
Responsibilities:
- Worked onBigDataHadoopcluster implementation anddataintegration in developing large-scale system software.
- Installed and configured MapReduce, HIVE and teh HDFS; implemented CDH3Hadoopcluster on Centos. Assisted with performance tuning and monitoring.
- Assessed existing and EDWtechnologies and methods to ensure our EDW/BI architecture meet teh needs of teh business and enterprise and allows for business growth.
- Developed MapReduce programs to parse teh rawdata, populate staging tables and store teh refineddatain partitioned tables in teh EDW.
- Captureddatafrom existing databases that provide SQL interfaces using Sqoop.
- Worked extensively with Sqoop for importing and exporting thedatafrom HDFS to RelationalDatabase systems/mainframe and vice-versa.
- Developed and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
- Created Hive queries that helped market analysts spot emerging trends by comparing freshdatawith EDW tables and historical metrics.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewedHadooplogfiles.
- Tested rawdataand executed performance scripts.
- Helped business processes by developing, installing and configuringHadoopecosystem components that moveddatafrom individual servers to HDFS.
- Created HBase tables to load large sets of structured, semi-structured and unstructureddatacoming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning.
- Developed multiple MapReduce jobs in Java fordatacleaning and preprocessing.
- Good knowledge and proficient in distributed computing principles,building stream-processing systems, using solutions such as Storm or Spark-Streaming.
- Assisted withdatacapacity planning and node forecasting.
- Collaborated with teh infrastructure, network, database, application and BI teams to ensuredataquality and availability.
- Experience on Kafka, RabbitMQ and various messaging systems.
- Developed workflow in Oozie to automate teh tasks of loading thedatainto HDFS and pre-processing with Pig.
- Coded complex Oracle stored procedures, functions, packages, and cursors for teh client specific applications.
- Production Rollout Support and resolving any issues that are discovered by teh client and client services teams.
Environment: Hadoop, MapReduce, HDFS, Hive, Spark- Scala, Kafka, Java (jdk1.6),Hadoopdistribution of HortonWorks, Cloudera, MapR, DataStax, IBM DataStage 8.1(Designer, Director, Administrator), PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting.
Confidential, Chevy Chase, MD
Sr. Big DataEngineer
Responsibilities:
- Helped teh team to increase cluster size from 35 nodes to 113 nodes. Teh configuration for additionaldatanodes was managed using Puppet.
- Responsible to managedatacoming from different sources and involved in HDFS maintenance and loading of structured and unstructureddata.
- Imported and exportingdatainto RDBMS and Hive using Sqoop.
- Able to partitioning a Hive table, creating an external table and differences between teh managed and external tables.
- Optimized HIVE analytics SQLqueries and achieve job performance.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Developed Pig scripts in teh areas where extensive coding needs to be reduced.
- Created HBasetables to store variabledataformats ofdatacoming from different portfolios.
- Developed backend (server side) in Scala.
- Designed technical solution for real-time analytics using Kafka and HBase.
- Solved performance issues in Hive and Pigscripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.
- Designed conceptual model with Spark for performance optimization.
- Developed Oozie workflow for scheduling and orchestrating teh ETL process
- Developed Map Reduce programs to parse teh rawdataand store teh refineddatain tables.
- Worked on creating theDataModel for Cassandra from teh current OracleDatamodel.
- Worked with CQL to execute queries on thedatapersisting in teh Cassandra cluster.
- Involved in loadingdatafrom UNIX file system to HDFS.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.
- Developed MapReduce programs to parse teh rawdataand store teh refineddatain tables.
- Used Tableau for visualizing and to generate reports.
- Used Flume to collect, aggregate, and store teh logdatafrom different web servers.
Environment: Hadoop, Map Reduce, Hive, Pig, Hbase, Sqoop, Flume, Cassandra, Scala, Spark, Oozie, Kafka, Linux, Java, Tableau, Eclipse, HDFS, PIG, Java (JDK), MySQL and Ubuntu.
Confidential, NYC NY
Java / Hadoop Developer
Responsibilities:
- Developed Scala programs with Spark fordatainHadoopecosystem.
- Managed and reviewedHadoopLog files as a part of administration for troubleshooting purposes.
- Developed Map reduces jobs using apache commons components.
- Installed and configured MapReduce, HIVE and teh HDFS
- Assisted with performance tuning and monitoring.
- Collected and aggregating large amounts of logdatausing ApacheFlume and stagingdatain HDFS for further analysis
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
- Created HBase tables to load large sets of structured, semi-structured and unstructureddatacoming from UNIX, NoSQL and a variety of portfolios.
- Coordinated with various stakeholders such as teh End Client, DBA Teams, Testing Team and Business Analysts.
- Designed Graphical User Interface (GUI) for various WebPages using AJAX, HTML, CSS,JSF, JQuery and JavaScript.
- Implemented Model View Controller (MVC) architecture Using Struts Framework at teh Web tier level to isolate each layer of teh application to avoid teh complexity of integration and ease of maintenance.
- Used JQuery and JSON for designing teh front end andJavaPortlets as it enhances portability and customization.
- Involved in Oracle database development by creating OraclePL/SQL Functions, Procedures, Triggers and Packages.
- Developed Servlets and JSPs based on MVC pattern using Struts framework and Spring Framework.
- Responsible for using XML Schema XSD, SAX, DOM, XSL, XSLT, XPath for development
Environment: Java, JSP,Hadoop, Hive, Pig, Cloudera CDH ApacheHadoop, Spring, Hibernate 3.0, Struts framework, HTML, XML, Log 4j, Eclipse, Unix, JQuery, JSON, JSF, Servlets, JDBC, AJAX, Web services, SOAP, XML
Confidential
Java/J2EE Developer
Responsibilities:
- Implemented Spring Bean Factory using IOC and AOP technologies.
- Developed teh application TDD(Test Driven Development) methodology.
- Designed and developed Customer registration and login screens using JSP, HTML and JavaScript.
- Developed stored procedures, triggers for efficient interaction with database.
- Used springsourceToolsSuite as IDE for Coding, testing and Weblogic for deployment of teh web application.
- Consumed SOAP based WebServices using ApacheCXF framework.
- Worked involved extensive usage of HTML, DHTML, CSS, JQuery, JavaScript and Ajax for client side development and validations.
- Implemented various complex PL/SQL queries.
- Used JMS to consume messages from Queue.
- Responsible for teh configuration of Struts web based application using struts-config.xml and web.xml.
- Modified Struts configuration files as per application requirements and developed Web services for non-javaclients to obtain user information.
- Configured spring to manage Actionclasses and set their dependencies in a spring context file and integrated middle tier with Struts.
- Servlets are used to service requests from UI to manipulate Business Objects and invoke respective Action Classes for making database updates
- Used Hibernate to retrieve data from teh Database and did teh inserting and deleting operations in teh database.
- Used XML parser APIs such as JAXP and JAXB in teh web service's request response data
Environment: CoreJava,J2EE, Spring IOC, Spring JDBC, Struts, SOAP, WSDL, Apache CXF, JSP, Servlets, HTML, CSS, JQuery, Ajax, JavaScript, Eclipse, UNIX, TOAD, Hibernate, Log4j, SOAP
