We provide IT Staff Augmentation Services!

Bigdata Engineer Resume

2.00/5 (Submit Your Rating)

San Francisco Pleasanton, Ca

SUMMARY:

  • 7+ years of IT experience in various industries with 4 years of hands on experience in developing Big - data and Hadoop applications.
  • Have strong technical foundation with in-depth knowledge in Big Data Hadoop, Data Reporting, Data Design, Data Analysis, Data governance, Data integration and Data quality.
  • Experience in administration such as setting, configuring and monitoring of Hadoop cluster of Cloudera, Hortonworks distribution.
  • Deep and extensive knowledge with HDFS, Spark, Kafka, Apache Nifi, MapReduce, Pig, Hive, HBase, Sqoop, Storm, Yarn, Flume, Oozie, Zookeeper, Solr, Cassandra, Splunk, Azure, MongoDB etc.
  • Thorough knowledge on Hadoop architecture and various components such as HDFS, Name Node, Data Node, Application Master, Resource Manager, Node Manager, Job Tracker, Task Tracker and MapReduce programming paradigm.
  • Good understanding on Hadoop MR1 and MR2 (YARN) Architecture.
  • Experience in analyzing data using HIVEQL, PIG Latin and Map Reduce programs in JAVA.
  • Expertise in writing Map Reduce Programs and UDFs for both HIVE and PIG in JAVA. Extended HIVE and PIG core functionality by using custom UDF's.
  • Experience in developing scalable solutions using NoSQL databases including HBASE, CASSANDRA, MongoDB and Couch DB.
  • Extracted files from NoSQL database like Couch DB, HBase through Flume and placed in HDFS for processing.
  • Efficient in working with Hive data warehouse tool creating tables, data distributing by implementing Partitioning and Bucketing strategy, writing and optimizing the HiveQL queries.
  • Good experience working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO, JSON, XML and Parquet.
  • Experience in using modern Big-Data tools like Spark SQL to convert schema-less data into more structured files for further analysis. Experience in Spark Streaming to receive real time data and store the stream data into HDFS.
  • Experience in working with Apache Sqoop to import and export data to and from HDFS and Hive.
  • Good working experience in designing Oozie workflows for cleaning data and storing into Hive tables for quick analysis.
  • Good knowledge streaming data using Flume and Kafka from multiple sources into HDFS.
  • Knowledge of processing and analyzing real-time data streams/flows using Kafka and HBase.
  • Good knowledge on computer system and network monitoring applications like Icinga and Nagios.
  • Experience with Informatica Power Center Big Data Edition (BDE) for high-speed Data Ingestion and Extraction.
  • Knowledge on real time graphing system Graphite.
  • Experience in storing and analyzing data in Data Lake.
  • Knowledge on ecommerce domain.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC. Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance. Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Proficient in using XML Technologies such as XML, XPath, XSLT, XSD, XQuery, WebSphere.
  • Expertise in RDBMS development including Oracle SQL, PL/SQL database backend programming with stored procedures, Functions and packages.
  • Expertise in using application development tools like IDEs Eclipse, IntelliJ and repositories GitHub, SVN and CVS for version control. Hands on experience using build tools such as Ant and Maven.
  • Expertise in data migration, aggregation and integration using ETL tool Pentaho, Talend.
  • Experienced with Agile SCRUM methodology (TDD, MVP), involved in design discussions and work estimations, takes initiatives, very proactive in solving problems and providing solutions.
  • Highly proficient in understanding new technologies and accomplishing new goals.
  • Good team player and ability to work in fast paced environments.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, Spark, Kafka, NIFI, MapReduce, Pig, Hive, Impala, HBase, Elasticsearch, Cassandra, Sqoop, Oozie, Zookeeper, Flume, Storm, YARN, CAWA, MongoDB, MAPR, Ranger, Mahout, Splunk, Neo4j, Falcon, Avro, Tez, AWS(ElaticSearch, S3, EMR).

Java & J2EE Technologies: Core Java, Hibernate, spring, JSP, Servlets, Java Beans, JDBC, EJB 3.0, JDBC, JMS, JMX, RMI.

IDE Tools:: Eclipse, IntelliJ.

Programming languages: Java, Python, Scala, C, C++, MATLAB, SQL, PL/SQL.

Web Services & Technologies: XML, HTML, XHTML, JNDI, HTML5, AJAX, JQuery, JSON, CSS, JavaScript, AngularJS, VB Script, WSDL, SOAP and RESTful.

ETL tools: Pentaho, Talend Studio, Informatica (MDM, IDQ, TPT), Teradata.

Databases: Oracle, SQL Server, MySQL, DB2, NoSQL, PostgreSQL.

Application Servers: Apache Tomcat, WebLogic, WebSphere, JBoss.

Tools: Maven, SBT, ANT, JUNIT, log4J.

Repositories & Tracking: GitHub, JIRA, Rally.

Operating Systems: Windows, UNIX, Linux, Mac OS.

SDLC: : Agile Methodology, Waterfall.

WORK EXPERIENCE:

Confidential, San Francisco & Pleasanton, CA

BigData Engineer

Responsibilities:

  • Ingests the different formats of data(CSV and Table) from different data sources like Oracle, EDI, kafka etc using ETL Talend Studio, spark in to bigdata hive tables.
  • Created .dat, pig and avro schemas to ingest the data in to hive tables using talend framework.
  • Involved in creating Hive tables, loading structured and unstructured data, and writing hive queries which will run internally in map reduce jobs.
  • Consuming data(JSON) from kafka and pushed into teradata(EDW) using talend flow.
  • Created wrapper/shell scripts to call the talend flow jar files.
  • Once the date ingested in to raw layer did some transformations using hive or pig scripts and push it into hive tables in tranformation layer.
  • Developing the consolidation views for reporting in consolidation layer by joining different tables depending on the metrics using Hive.
  • Involved in Hbase and Cassandra setup and storing data into Hbase and cassandra, which will be used for analysis.
  • Got a chance to do data modeling and to prepare source to target mapping(STM) documents.
  • Streaming the data from kafka to bigdata raw layer using spark streaming. After that moved into transformation layer tables by doing some tranformations using SparkSQL. This is one more feed into bigdata by streaming.
  • Expertise in developing jobs using Spark framework modules like Spark-Core, Spark-SQL and Spark Streaming using Java, Scala, Python.
  • Import the data from different sources like HDFS/HBase into SparkRDD.
  • Scheduled all these jobs using Oozie and CA WorkLoad(CAWA) job scheduler.
  • Experience in Machine Learning(ML) using Mahout and Azure.
  • Using Agile Life Cycle methodology.

Environment: Hive, Pig, Spark, SparkSQL, ETL Talend Studio, ETL Teradata(EDW), Oracle, MySQL, HBase/Cassandra, EDI, SQL, Java, Scala, Shell Script, Python, Avro, JSON, JIRA, DBeaver, Kafka, Azure, REST/SOAP, AWS(ES, EMR), Spring, Teradata SQL Assistant, Unix/Linux, Eclipse, GitHub, Oozie, Jenkins, MLlib.

Confidential, Philadelphia, PA

BigData Engineer

Responsibilities:

  • Used ETL Pentaho to collect, aggregate, and store data from different sources to HDFS.
  • Built a Python and Linux Shell script to ingest batch data to Kafka and Elastic Search.
  • Developed Kafka consumer to streaming data from Kafka to Hbase.
  • Good Experince in writing Avro schemas. And also coverting csv to JSON and XML files.
  • Experience in creating NiFi flow to streaming data between Kafka, PulsarDB, ElasticSearch, and FTP. And also deploying those Nifi flows both into staging and production.
  • Experience in streaming the data between Kafka and databases like HBase and Elasticsearch.
  • Great working experience on Real-time streaming data using Spark and Kafka connect.
  • Used Kafka for log aggregation like collecting physical log files off servers and puts them in a central place like HDFS for processing.
  • Developed Spark Programs for Batch and Real time Processing. Developed Spark Streaming applications for Real Time Processing.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Experience with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
  • Experience in converting Hive/SQL queries into Spark transformations using Java.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Java/Scala.
  • Used Neo4j to track ans see the usage of internet by using the customers account numbers.
  • Implemented Spark using Java/Scala and Spark SQL for faster testing and processing of data.
  • Experience in Agile life-cycle development.

Environment: HDFS, Spark, Spark Streaming, SparkSQL, Kafka, Nifi, ETL Pentaho, HBase, JSON, CSV, AVRO, AWS ElasticSearch, EMR, Cloudera Manager(CDH), MangoDB, Teradata, shell scripting, Scala, Java, SOAP, REST, Spring, Talend, GitHub, Rally, JIRA, SQL, UC4, XML, Agile.

Confidential, Franklin Lakes, NJ - St. Louis, MO

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop. Hands-on experience with productionalizing Hadoop applications viz. administration, configuration management, monitoring, debugging and performance tuning.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked on Big Data Integration and Analytics based on Hadoop, Spark, Kafka, Storm and No-SQL databases.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Loading streaming data using kafka and processing Using Storm
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
  • Good in Configure, Design, implement and monitor Kafka Cluster and connectors.
  • Experience in streaming the data between Kafka and other databases like RDBMS and NoSQL.
  • Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model and persists the data in Cassandra.
  • Implemented different machine learning techniques in Scala using Spark machine learning library.
  • Experience in using Avro format messages to process messages in Kafka.
  • Developed Spark applications using Scala for easy Hadoop transitions.
  • Used Spark with YARN and got performance results compared with MapReduce.
  • Querying of both Managed and External tables created by Hive using Impala.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
  • Created Hbase tables to store various data formats of PII data coming from different portfolios.
  • Cluster co-ordination services through ZooKeeper.
  • Worked on installing and configuring EC2 instances on Amazon Web Services (AWS) for establishing clusters on cloud.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Good knowledge in configuring and working with Flume to load data from multiple sources to HDFS.
  • Involved in Automation of clickstream data collection and store into HDFS using Flume.
  • Used different Serde's for converting JSON data into pipe separated data.
  • Configured Flume sinks, source and specified memory channel capacity.
  • Monitoring YARN applications. Troubleshoot and resolve cluster related system problems.
  • Used Pandas and NumPy array indexing, and also Series and DataFrame of pandas in python.
  • Comparing the Cassandra and HBase NoSQL databases and setting benchmarks in different contexts.
  • Experience with Informatica Power Center Big Data Edition (BDE) for high-speed Data Ingestion and Extraction.
  • Helped with the sizing and performance tuning of the Cassandra cluster.
  • Involved in the process of Cassandra data modelling and building efficient data structures.
  • Using Azkaban to Schedule workflows and Track user actions finally ticketing.
  • Trained and mentored analyst and test team on Hadoop framework, HDFS, Map Reduce concepts, Hadoop Ecosystem.
  • Use to collect, conduct and curate realtime data by moving it from source to destination by Apache Nifi.
  • Experience in sequence data pre-processing, extraction, model fitting and validation using ML pipelines.
  • Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
  • Designing and creating ETL jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
  • Used sqoop to import data from SQL server to hadoop ecosystem.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Also exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Pair RDDs, Storm, Spark YARN.
  • Integration of Cassandra with Talend and automation of jobs.
  • Did Scheduling and monitoring the console outputs through Jenkins.
  • Worked in Agile environment, which uses Jira to maintain the story points.
  • Worked on Implementation of a toolkit that abstracted Solr&ElasticSearch.
  • Maintenance and troubleshooting in Cassandra cluster.
  • Integrated data by using Talend integration tool.
  • Installed and configured Hive and also written Hive UDFs in java and python
  • Responsible for architecting Hadoop clusters
  • Experience with Agile TDD in doing unit tests repeatedly on source code.
  • Assist with the addition of Hadoop processing to the IT infrastructure.
  • Perform data analysis using Hive and Pig.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, Jira, Python, SQL, Cloudera Manager, Spark, Spark Streaming, SparkSQL, Cassandra, Pig, Sqoop, Oozie, ZooKeeper, bash, Storm, Flume, Kafka, shell scripting, SOAP, REST, Parquet, Azkaban, Impala, Tez, JSON, Solr, Talend Open Studio, Teradata, Scala, PL/SQL, MySQL, NoSQL, AWS ElasticSearch, XML, Windows, Hortonworks, HBase.

Confidential, Parsippany, NJ

Hadoop Developer

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Installed cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity and configured Hadoop, MapReduce, HDFS, developed multiple MapReduce jobs in JAVA for data cleaning.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Developed Map Reduce programs for data analysis and data cleaning.
  • Worked on Optimizing Hive quires to improve performance and processing time.
  • Experience in configuring and administrating the clusters.
  • Quick debugging and solving the issues.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from multiple data systems.
  • Configured, Designed, implemented and monitored Kafka cluster and connectors.
  • Developed Pig Latin scripts for the analysis and data manipulation.
  • Implemented a proof of concept (Poc’s) using Kafka, Strom, HBase for processing streaming data.
  • Implemented automation scripts using Python.
  • Developed Python scripts to process the data for visualizing.
  • Experience in streaming the data between Kafka and other databases like RDBMS and NoSQL.
  • Load and transform large sets of structured and semi structured data.
  • Architecture design and develop the whole application to ingest and process high volume data in to Hadoop using Sqoop, Flume.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
  • Design and develop customized business rule framework to implement business logic for the existing process in ETL environment using Hive, Pig UDFs.
  • Experienced in working with various kinds of data sources such as Teradata and Oracle and successfully loaded files to HDFS.
  • Experienced in using Zookeeper Operational Services for coordinating the cluster.
  • Experienced in data warehousing using Amazon Redshift when dealing with large sets of data.
  • Used Ant Scripts and Maven in building the application and auto deploying it to the environment.
  • Written Oozie workflows for scheduling jobs.
  • Worked on extracting files from MongoDB and placed in HDFS and processed.
  • Involved in building the REST API using Jersey API for fetching the data from NoSQL MongoDB.
  • Developed application component interacting with MongoDB
  • Involved in loading data from UNIX file system to HDFS.
  • Exported the analyzed data to the relational databases using Tableau for visualization and to generate reports for the BI team.
  • Created an E-mail notification service upon completion of job for the particular team which requested for the data.
  • Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (Poc) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
  • Maintain System integrity of all sub-components related to Hadoop.

Environment: Apache Hadoop, HDFS, Java, MapReduce, Hive, Pig, Sqoop, Oozie, Cassandra, Strom, Maven, Flume, Hbase, MongoDB, Eclipse.

Confidential

Java/Hadoop Developer

Responsibilities:

  • Performed requirement gathering, design, coding, testing, implementation and deployment.
  • Worked on modeling of Dialog process, Business Processes and coding Business Objects, QueryMapper and JUnit files.
  • Created the Business Objects methods using Java and integrating the activity diagrams.
  • Worked in web services using SOAP, WSDL.
  • Wrote Query Mappers and MQ Experience in JUnit Test Cases.
  • Developed the UI using XSL and JavaScript.
  • Managed software configuration using ClearCase and SVN.
  • Design, develop and test features and enhancements.
  • Perform error rate analysis of production issues and technical errors. Provide production support. Fix production defects.
  • Analyze user requirement document and develop test plan, which includes test objectives, test strategies, test environment, and test priorities.
  • Perform Functional testing, Performance testing, Integration testing, Regression testing, Smoke testing and User Acceptance Testing (UAT).
  • Converted Complex SQL queries running at mainframes into pig and Hive as a part of a migration from mainframes into Hadoop cluster.

Environment: Hadoop, Pig, Hive, MarReduce, Shell Scripting, Java 6, JEE, Spring, Hibernate, WebLogic, Eclipse, Oracle 10g, JavaScript, Servlets, Nodejs, JMS, Ant, Log4j and Junit,

Confidential

Java Developer

Responsibilities:

  • Actively involved in Analysis, Detail Design, Development, System Testing and User Acceptance Testing.
  • Developing Intranet Web Application using J2EE architecture, using JSP to design the user interfaces, and JSP tag libraries to define custom tags and JDBC for database connectivity.
  • Implemented struts framework (MVC): developed Action Servlet, Action Form bean, configured the struts-config descriptor, implemented validator framework.
  • Extensively involved in database designing work with Oracle Database and building the application in J2EE Architecture.
  • Integrated messaging with MQSERIES classes for JMS, which provides XML message Based interface. In this application publish-and-subscribe model of JMS is used.
  • Developed the EJB-Session Bean that acts as Facade, will be able to access the business entities through their local home interfaces.
  • Evaluated and worked with EJB's Container Managed Persistent strategy.
  • Used Web services - WSDL and SOAP for getting Loan information from third party and used SAX and DOM XML parsers for data retrieval
  • Experienced in writing the DTD for document exchange XML. Generating, parsing and displaying the XML in various formats using XSLT and CSS.
  • Used SVN version controlling system for the source code and project management.
  • Used XPath 1.0 for selecting nodes and XQuery to extract and manipulate data from XML documents.
  • Coding, testing and deploying the web application using RAD 7.0 and WebSphere Application Server 6.0.
  • Used JavaScript's for validating client side data.
  • Wrote unit tests for the implemented bean code using JUnit.
  • Extensively worked on UNIX Environment.
  • Data is exchanged in XML format, which helps in interoperability with other software applications.

Environment: Struts 2, Rational Rose, JMS, EJB, JSP, RAD 7.0, WebSphere Application Server 6.0, XML parsers, XSL, XQuery, XPath 1.0, HTML, CSS, JavaScript, IBM MQ Series, JBoss, ANT, JUnit, SVN, JDBC, Oracle, Unix, SVN.

Confidential

Java Consultant

Responsibilities:

  • Develop GUI related changes using JSP, HTML and client validations using JavaScript.
  • Implemented client side validation using JavaScript.
  • Developed user interface using JSP, Struts Tag Libraries to simplify the complexities of the application.
  • Developed business logic using Stateless session beans for calculating asset depreciation on Straight line and written down value approaches.
  • Involved coding SQL Queries, Stored Procedures and Triggers.
  • Created java classes to communicate with database using JDBC.
  • Responsible for design and implementation of various modules of the application using Struts-Spring-Hibernate architecture.
  • Developed Business Logic using Session Beans.
  • Implemented Entity Beans for Object Relational mapping.
  • Implemented Service Locater Pattern using local caching.
  • Worked with collections.
  • Implemented Session Facade Pattern using Session and Entity Beans
  • Developed message driven beans to listen to JMS.
  • Developed the Web Interface using Servlets, Java Server Pages, HTML and CSS.
  • Used WebLogic to deploy applications on local and development environments of the application.
  • Extensively used the JDBC Prepared Statement to embed the SQL queries into the java code.
  • Developed DAO (Data Access Objects) using Spring Framework 3.
  • Developed Web applications with Rich Internet applications using Java applets, Silverlight, Java.
  • Used JavaScript to perform client side validations and Struts-Validator Framework for server-side validation.
  • Designed and developed the application using various design patterns, such as session facade, business delegate and service locator.
  • Involved in designing use-case diagrams, class diagrams, interaction using UML model with Rational Rose.
  • Involved in fine-tuning of application.
  • Thoroughly involved in testing phase and implemented test cases using Junit.

Environment: Java, Servlets, Entity Bean, Session Bean, JSP, CVS, EJB, J2EE, WebLogic, STRUTS, XML, XSLT, JavaScript, HTML, CSS, Spring 3.2, SQL, PL/SQL, MS Visio, Eclipse, JDBC, Windows XP, Triggers, Stored Procedures.

We'd love your feedback!