We provide IT Staff Augmentation Services!

Hadoop/big Data Analyst Resume

2.00/5 (Submit Your Rating)

Cypress, CA

PROFESSIONAL SUMMARY:

  • Over 8 Years of strong experience working on Big Data /Hadoop,NOSQL and Java/J2EE applications.
  • Over 4 years of experience working with BigData and Hadoop ecosystem with expertise in tools like HDFS, MapReduce, HIVE, PIG, HBase,SQOOP,Oozie, Zookeeper, Spark, Kafka, Storm, Cassandra, Impala, Snappy,Greenplum & MongoDB
  • Experience with Web Application Development, Deployment using Java and J2EE tools & technologies like Servlets, Java Script, JSP, JDBC, Struts, Spring, Hibernate, XML, WebLogic and Apache Tomcat.
  • Experience with distributed systems, large - scale non-relational data stores and big data systems.
  • Proficient in writing build scripts using Ant& Maven.
  • Experience in running TWS jobs for processing millions of records.
  • Generated recommendations using collaborative filtering algorithm from Mahout's machine learning library
  • Experience in handling various tools for Big Data analysis using Pig, Hive and good understanding of Sqoop and Puppet.
  • Experience in importing streaming logs and aggregating the data to HDFS through Flume.
  • Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
  • Developed a global data strategy and implemented Hadoop and Mahout for predictive analytics.
  • Experience in storing and retrieval of documents in Apache Solr.
  • Used Oozie Scheduler system to automate the pipeline workflow and orchestrate Hive, pig and mapreduce jobs that extract the data on a timely manner.
  • Exposure to streaming data processing frameworks such as Spark, Streaming, Storm.
  • Experience building data processing pipeline using Kafka and Storm to store data into HDFS.
  • Experience with Testing MapReduce programs using MRUnit and EasyMock.
  • Experienced in Hadoop data testing, HDFS and HIVE data testing and validation.
  • Expertise in writing Shell-Scripts, Cron Automation and using Regular Expressions.
  • Good knowledge in evaluating big data analytics libraries (MLlib) and use of Spark-SQL for data exploratory
  • Experienced in visualization model using Javascript (d3.js), html/css
  • Expertise in Developing Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig..
  • Extensive experience in performance tuning of complex SQL queries troubleshooting and debugging queries
  • Extensive experience supportingMapReduce Programs running on the cluster and wrote custom MapReduce Scripts for Data Processing in Java
  • Hands on experience in Sequence files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
  • Developed an Apache Spark Streaming module for consumption of Avro messages from Kafka to store in HDFS
  • Experience in monitoring and managing Hadoop clusters using Cloudera Manager tool
  • Sound Knowledge in Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Compression techniqueslike LZO, Bzip2and Snappy.
  • Good Knowledge in using job workflow scheduling and monitoring tools like Oozie and Autosys.
  • Facilitate access / ETL to large data sets utilizing Pig/Hive/Hbase/Python on Hadoop Ecosystem.
  • Good understanding in configuration of spouts& bolts in Stormto ingest and process data on the fly.
  • Experienced in Generating Logs are forwarded to the Elastic Search database
  • Experienced in using Solr to create search indexes to perform search operations faster
  • Expertise in data modeling on NoSQL databases like MongoDB, HBaseand Cassandra.
  • Experienced in migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Strong technical skills in analyzing data with standard statistical methods/tools, interpret results and convey findings in a concise and professional manner
  • Have been involved in Full Cycle implementation including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor
  • Good working Knowledge and experience in ETL and processing data transfer from RDBMS to HDFS (Hadoop environment)
  • Adequate knowledge and working experience in Agile & Waterfall methodologies.
  • Hands on experience with various databases such as Oracle, MySQL and IBM DB2.
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, JDBC, EJB.
  • Excellent OOAD skills with design & development in Java, SOAP and REST Web Services.
  • Experience with web-based UI development using jQuery UI, jQuery, ExtJS, CSS, HTML, HTML5, XHTML and Java script.
  • Extensive experience with SQL, PL/SQL and database concepts, Developed stored procedures and queries using PL/SQL.
  • Team player with excellent communication, presentation and interpersonal skills.
  • Highly motivated team player with zeal to learn new technologies.

TECHNICAL SKILLS:

Languages/Tools: Java, C, C++, C#,Scala, VB, XML, HTML/XHTML, HDML, DHTML.

Big Data: HDFS, MapReduce, HIVE, PIG, HBase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Kafka, Storm, Cassandra, Solr, Impala, Greenplum, MongoDB

J2EE Standards: JDBC, JNDI, JMS, Java Mail & XML Deployment Descriptors.

Web/Distributed Technologies: J2EE, Servlets 2.1/2.2, JSP 2.0, Struts 1.1, Hibernate 3.0, JSF, JSTL1.1,EJB 1.1 /2.0, RMI,JNI, XML,JAXP,XSL,XSLT, UML, MVC,STRUTS,Spring 2.0, Corba, Java Threads.

Operating System: Windows 95/98/NT/2000/XP, MS-DOS, UNIX, multiple flavors of Linux.

Databases / NO SQL: Oracle 10g, MS SQL Server 2000, DB2, MS Access & MySQL.Teradata, Cassandra, Greenplum and MongoDB

Browser Languages: HTML, XHTML, CSS, XML, XSL, XSD, XSLT.

Browser Scripting: Java script, HTML DOM, DHTML, AJAX.

App/Web Servers: IBM Websphere 5.1.2/5.0/4.0/3.5, BEA Web logic 5.1/7.0, Jdeveloper, Apache Tomcat, JBoss.

GUI Environment: Swing, AWT, Applets.

Messaging & Web Services Technology: SOAP, WSDL,UDDI, XML, SOA, JAX-RPC, IBM WebSphere MQ v5.3, JMS.

Networking Protocols: HTTP, HTTPS, FTP, UDP, TCP/IP, SNMP, SMTP, POP3.

Testing &Case Tools: Junit, Log4j, Rational Clear case, CVS, ANT, Maven, JBuilder.

Version Control Systems: Git,SVN, CVS

PROFESSIONAL EXPERIENCE:

Confidential, Cypress, CA

Hadoop/Big Data Analyst

Responsibilities:

  • Understanding business needs, analyzing functional specifications and map those to development and designing.
  • Involved in creating Hive ORC tables, loading the data into it and writing Hive queries to analyze the data.
  • Developed MapReduce programs to parse the raw data, and create intermediate data which would be further used to be loaded into hive portioned data
  • Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load on variety of sources like web server, RDBMS and Data API’s.
  • Worked on Golden Gate replication tool to get data from various data sources into HDFS
  • Developed MapReduce programs and Hive queries to generate reports as per business requirements.
  • Expertise in creating TWS Jobs and Jobstreams and automate them as per schedule
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Created UDFs to calculate the pending payment for the given customer data based on last day of every month and used in Hive Scripts.
  • Involved in loading data from LINUX file system to HDFS.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Obtained good experience with NOSQL databases
  • Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume
  • Experience in managing and reviewing Hadoop log files.
  • Experience in
  • Experience in custom aggregate functions using Spark SQL and performed interactive querying.
  • Involved in running MapReduce jobs for processing millions of records.
  • Involved in running TWS jobs for processing millions of records using ITG.
  • Used different file formats like Text files, Sequence Files, Avro, Optimized Row Columnar (ORC)
  • Involved in customizing the partitioner in MapReduce in order to root Key value pairs from Mapper to Reducers in XML format according to requirement.
  • Expertise in writing Shell-Scripts.

Environment: Core Java, J2EE, Hadoop, HDFS, Flume, Hive, MapReduce, Sqoop, LINUX, mapr, Big Data, Golden Gate, UNIX Shell Scripting

Confidential, Cypress, CA

Hadoop/Big Data Analyst

Responsibilities:

  • Involved in creating tables, partitioning, bucketing of table in Hive
  • Creating Hive tables and working on them using Hive QL.
  • Worked on NoSQL (HBase) for support enterprise production and loading data into HBASE using SQOOP.
  • Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Involved in loading data from UNIX file system to HDFS.
  • Experience in UNIX Shell scripting.
  • Hands on Experience on Linux systems
  • Written Hive and Pig scripts as per requirements.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
  • Extensive experience in writing HDFS and PigLatin commands.
  • Performance tuning using Partitioning, bucketing of Hive tables
  • Load and transform large sets of structured, semi structured and unstructured data
  • Design and implement MapReduce jobs to support distributed processing using java, Hive and Apache Pig.
  • Managed and created tables in HBASE - NoSQL database
  • Maintenance of data importing scripts using Hive and MapReduce jobs.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Involved in writing shell scripts to run the jobs in parallel and increase the performance
  • Developed Hive queries to process the data for visualizing and reporting.
  • Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
  • Good experience in using SQOOP for handling data transfer from RDBMS to Hadoop and Vice-versa Analyzing/Transforming data with Hive and Pig
  • Experience on working with different data types like FLATFILES, ORC, AVRO and JSON.
  • Experience in developing Hive UDFs using Java programming language.
  • Analyze or transform stored data by writing MapReduce based on business requirements.

Environment: Core Java, J2EE, Hadoop, HDFS, Hive, MapReduce, Sqoop, LINUX, mapr, Big Data, UNIX Shell Scripting

Confidential, Akron, OH

Hadoop/Big Data Analyst

Responsibilities:

  • Developed MapReduce programs to parse the raw data, and create intermediately data which would be further used to be loaded into hive portioned data.
  • Understanding business needs, analyzing functional specifications and map those to develop and designing MapReduce programs and algorithms. responsible for developing custom input format in Mapreduce in order to parse the custom log file and convert them into key value pairs for further processing inMapreduce.
  • Involved in data ingestion into HDFS using Sqoopand Flumefrom variety of sources like web server, RDBMS and Data API’s.
  • Developed MapReduce programs and Hive queries to analyze sales pattern and customer satisfaction index over the data present in various relational database tables.
  • Optimizing Hadoop MapReduce code, Hive/Pig scripts for better scalability, reliability and performance.
  • Analyze large and critical datasets using Cloudera, HDFS,Hbase, MapReduce, Hive, Hive UDF, Pig,Sqoop, Zookeeper, &Spark. customized Flume interceptors to encrypt and mask customer sensitive data as per requirement
  • Recommendations using Item Based Collaborative Filtering in Apache Spark.
  • Made successful predictions on the analyzed trends i.e at what parameters the product is most selling, using Mahout.
  • Performed importing data from various sources to the Cassandra cluster using Java APIs or Sqoop.
  • Worked with NoSQL database Hbase to create tables and store data.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Designed and implemented MapReduce-based large-scale parallel relation-learning system.
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Worked on creating indexes and working with Indexes using SOLR on Hadoop Distributed Platform
  • Experience in storing the analyzed results back into the Cassandra cluster.
  • Developed iterative algorithms using Spark Streaming in Scala for near real-time dashboards.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Designed and built the Reporting Application, which uses the Spark SQL to fetch and generate reports on HBase table data.
  • Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data intoHBase.
  • Performed read operation on local files, XML files, excel files, JSON files in python with use of PANDAS module.
  • Used Maven for continuous build integration and deployment
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Involved in ETL, Data Integration and Migration
  • Used different file formats like Text files, Sequence Files, Avro, Record Columnar CRC, ORC
  • Developed Oozie workflow for scheduling and orchestrating the ETL process.
  • Exported the data from Avro files and indexed the documents in sequence file format.
  • Implemented various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in Hive, and using Compression Codec’s where ever necessary
  • Implemented test scripts to support test driven development and continuous integration.
  • Configured Flume for efficiently collecting, aggregating and moving large amounts of log data.
  • Installed and configured Hadoop and Hadoop stack on a 26 node cluster.
  • Install, configure, and operate data integration and analytic tools i.e. Informatica, Chorus, SQLFire, & Gem Fire XD for business needs.
  • Worked with cloud services like Amazon web services (AWS)
  • Installed and configured Hive and also wrote Hive UDF’s that helped spot market trends.
  • Involved in customizing the partitioner in MapReduce inorder to root Key value pairs from Mapper to Reducers inXML format according to requirement.
  • Worked with Zookeeper, Oozie, AppWorx and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
  • Involved in loading data from UNIX file system to HDFS.
  • Implemented Fair schedulers on the Job tracker with appropriate parameters to share the resources of the Cluster for the MapReduce jobs given by the users.
  • Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data.
  • Gained very good business knowledge on different category of products and designs within.

Environment: HDFS, MapReduce, Cloudera, Hbase, Hive, Pig, Solr, Sqoop, Spark,Cassandra,Scala,Mahout, Flume, Oozie, Zookeeper, Maven, Linux, Python, UNIX Shell Scripting,Struts,JSP, Servlets, WebSphere, HTML, XML, JavaScript.

Confidential, Fremont, CA

Hadoop/Big Data Analyst

Responsibilities:

  • Developed MapReduce programs to parse and filter the raw data store the refined data in partitioned tables in the Greenplum.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with Greenplum reference tables and historical metrics.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Involved in running MapReduce jobs for processing millions of records.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Experienced in migrating Hive QL into Impala to minimize query response time.
  • Responsible for Data Modeling in Cassandra as per our requirement.
  • Shared responsibility for administration of Hadoop, Hive and Pig.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozieand cron jobs.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Used Elastic Search & MongoDB for storing and querying the offers and non-offers data.
  • Created UDFs to calculate the pending payment for the given Residential or Small Business customer, and used in Pig and Hive Scripts .
  • Deployed and built the application using Maven .
  • Maintain Hadoop, Hadoop ecosystems, third party software, and database(s) with updates/upgrades, performance tuning and monitoring usingAmbari
  • Obtained good experience with NOSQL database Cassandra.
  • Used Cassandra CQL with Java API's to retrieve data from Cassandra tables.
  • Experience in managing and reviewing Hadoop log files.
  • Experienced in moving data from Hive tables into Cassandra for real time analytics on Hive tables.
  • Used Python scripting for large scale text processing utilities
  • Handled importing of data from various data sources, performed transformations using Hive. (External tables, partitioning).
  • Involved in NoSQL (DataStax Cassandra) database design, integration and implementation.
  • ImplementedCRUD operations involving lists, sets and maps in DataStax Cassandra.
  • Responsible for data modeling in MongoDB in order to load data which is coming as structured as well as unstructured data.
  • Unstructured files like XML's, JSON files are processed using custom built Java API and pushed into mongodb.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Created tables, inserted data and executed various Cassandra Query Language (CQL 3) commands on tables from java code and using cqlsh command line client .
  • Wrote test cases in MRunit for unit testing of Mapreduce Programs
  • Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax
  • Created Business Logic using Servlets, Session beans and deployed them on Web logic server
  • Involved in templates and screens in HTML and JavaScript
  • Developed the XML Schema and Web services for the data maintenance and structures
  • Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects
  • Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes

Environment: HDFS, MapReduce, Hive, Pig, Cloudera, Impala, Oozie, Greenplum, MongoDB, Cassandra,Kafka, Storm,Maven, Python,CloudManager, NagiOS, Ambari, JDK, J2EE, Struts,JSP, Servlets, Elastic Search, WebSphere, HTML, XML, JavaScript, MRunit

Confidential, Pasadena, CA

Hadoop/Big Data Analyst

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Created Hbase tables to store various data formats of PII data coming from different portfolios.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance Pig queries.
  • Involved in loading data from LINUX file system to HDFS .
  • Importing and exporting data into HDFS and Hive using Sqoop .
  • Experience working on processing unstructured data using Pig and Hive.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Gained experience in managing and reviewing Hadoop log files.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Developed Pig Latin scripts to extract data from the web server output files to load into HDFS .
  • Extensively used Pig for data cleansing.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Implemented SQL, PL/SQL Stored Procedures.
  • Actively involved in code review and bug fixing for improving the performance.
  • Developed screens using JSP, DHTML, CSS, AJAX, JavaScript, Struts, Spring, Java and XML

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, LINUX, Cloudera, Big Data, Java APIs, Java collection, SQL, AJAX.

Confidential, Matawan, NJ

Java/J2EE Developer

Responsibilities:

  • Involved in design and development phases of Software Development Life Cycle (SDLC)
  • Involved in designing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
  • Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
  • Developed user interface using JSP, JSP Tag libraries,and Java Script to simplify the complexities of the application.
  • Implemented Model View Controller (MVC) architecture using Jakarta Struts 1.3 frameworks at presentation tier.
  • Developed a Dojo based front end including forms and controls and programmed event handling.
  • Implemented SOA architecture with web services using JAX-RS (REST) and JAX-WS (SOAP)
  • Developed various Enterprise Java Bean components to fulfill the business functionality.
  • Created Action Classes which route submittals to appropriate EJB components and render retrieved information.
  • Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer.
  • Used Core java and object oriented concepts.
  • Extensively used Hibernate 3.0 in data access layer to access and update information in the database.
  • Used Spring 2.0 Framework for Dependency injection and integrated it with the Struts Framework and Hibernate.
  • Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
  • Proficient in writing SQL queries, stored procedures for multiple databases, Oracle 10g and SQL Server 2005.
  • Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
  • Deployed application on windows using IBM Web Sphere Application Server.
  • Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
  • Used Web Services - WSDL and REST for getting credit card information from third party and used SAX and DOM XML parsers for data retrieval.
  • Implemented SOA architecture with web services using Web Services like JAX-WS.
  • Extensively used IBM RAD 7.0 for writing code.
  • Implemented Persistence layer using Hibernate to interact with Oracle 10g and SQL Server 2005 databases.
  • Used ANT scripts to build the application and deployed on Web Sphere Application Server

Environment: Core Java, J2EE, Web Logic 9.2, Oracle 10g, SQL Server,JSP, STRUTS, JDK, JSF,JAX-RS (REST), JAX-WS (SOAP), JMS, Hibernate, JavaScript, HTML, CSS, IBM RAD 7.0,AJAX, JSTL, ANT1.7 build tool, Junit, Spring, Log4j, Web Services.

Confidential

Java Developer

Responsibilities:

  • Extensively involved in the design and development of JSP screens to suit specific modules.
  • Converted the application’s console printing of process information to proper logging technology using log4j.
  • Developed the business components (in core Java) used in the JSP screens.
  • Involved in the implementation of logical and physical database design by creating suitable tables, views and triggers.
  • Developed related procedures and functions used by JDBC calls in the above components.
  • Extensively involved in performance tuning of Oracle queries.
  • Created components to extract application messages stored in xml files.
  • Executed UNIX shell scripts for command line administrative access to oracle database and for scheduling backup jobs.
  • Created war files and deployed in web server.
  • Performed source and version control using VSS.
  • Involved in maintenance support.

Environment: JDK, HTML, JavaScript, XML, JSP, Servlets, JDBC, Oracle 9i, Eclipse, Toad, UNIX Shell Scripting, MS Visual SourceSafe, Windows 2000.

We'd love your feedback!