Big Data Analyst Resume
Bentonville, AR
SUMMARY
- Over Eight (8+) years of overall IT experience in a variety of industries, which includes hands on experience of 3+ years in Big Data technologies and extensive experience of 4+ years in Java technologies.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently
- Hands on experience in working with Ecosystems like Hive, Pig, Sqoop, Map Reduce, Flume, Oozie.Strong knowledge of Pig and Hive’s analytical functions, extending Hive and Pig core functionality by writing custom UDFs
- Experience in importing and exporting terra bytes of data using Sqoop from HDFS to Relational Database Systems and vice - versa
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper, of NoSQL databases such as HBase, Cassandra, and of administrative tasks such as installing Hadoop, Commissioning and decommissioning, and its ecosystem components such as Flume, Oozie, Hive and Pig.
- Experience in design, development and testing of Distributed, Internet/Intranet/E-Commerce, Client/Server and Database applications mainly using technologies Java, EJB, Servlets, JDBC, JSP, Struts, Hibernate, Spring, JavaScript on WebLogic, Apache Tomcat Web/Application Servers and with Oracle and SQL Server Databases on Unix, windows NT platforms
- Extensive work experience in Object Oriented Analysis and Design, Java/J2EE technologies including HTML, XHTML, DHTML, JavaScript, JSTL, CSS, AJAX and Oracle for developing server side applications and user interfaces.
- Experience in developing Middle-tier components in distributed transaction management system using Java. Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP
- Extensive experience in working with different databases such as Oracle, IBM DB, RDBMS, SQL Server, MySQL and writing Stored Procedures, Functions, Joins and Triggers for different Data Models.
- Experience in working with database like Oracle 8i/9i/10g and knowledge of MS Access.
- Sound proficiency in analyzing and creating Use Cases, Use Case Diagram, Sequence Diagrams, Data Flow Diagrams and Business Flow Diagrams.
- Intensively involved in supporting, maintaining and troubleshooting activities for software applications and services.
- Well experienced in software unit testing, functional testing, integration testing, regression testing and highly efficient at fact-finding, root cause analysis and bug fixing.
- Experience with Agile Methodology, Scrum Methodology, software version control and release management
- Handled several techno-functional responsibilities including estimates, identifying functional and technical gaps, requirements gathering, designing solutions, development, developing documentation, and production support.
TECHNICAL SKILLS
Database: Teradata, DB2, MySQL, Oracle, MS SQL Server, IMS/DB
Languages: Java, PIG Latin, SQL, HiveQL, Shell Scripting, and XML
API’s/Tools: Mahout, Eclipse, Log4j, Maven
Web Technologies: HTML, XML, JavaScript
BigData Ecosystem: HDFS, PIG, MAPREDUCE, HIVE, SQOOP, FLUME, OOZIE, HBase, MongoDB, AWS, Solr Search, Impala, Cassandra, Storm, Flume, Spark, Kafka
Operating System: UNIX, Linux, Windows XP, IBM Z/OS
BI Tools: Tableau, Pentaho, Hyperion, OBIEE
PROFESSIONAL EXPERIENCE
Confidential, Bentonville, AR
Big Data Analyst
Responsibilities:
- Analyzed large data sets by running custom map reduce, Hive queries and Pig scripts
- Complex pig udf for business transformations
- Worked with the Data Science team, Teradata team, and business to gather requirements for various data sources like webscrapes, APIs
- Involved in creating Hive/Impala tables, and loading and analyzing data using hive queries
- Involved in running Hadoop jobs for processing millions of records and compression techniques
- Developed multiple MapReduce jobs in java for data cleaning and pre-processing
- Involved in loading data from LINUX file system to HDFS, and wrote shell scripts for productionizing the MAP (Member Analytics Platform) project and automated using Cronacle scheduler
- Load and transform large sets of structured, and semi structured data
- Loaded Golden collection to Apache Solr using morphline code for Business team
- Assisted in exporting analyzed data to relational databases using Sqoop
- Data Modelled for Hbase for large transaction sales data
- Proof of Concept on Strom for streaming the data from one of the sources
- Proof of Concept in Pentaho for Big Data
- Implementation of one of the data source transformations in spark using Scala
- Cassandra Data Model and design to connect with spark
- Teradata Fast Export and Parallel Transporter utilities and Sqoop to extract data and load to Hadoop
- Worked in Agile methodology and used Ice Scrum for Development and tracking the project
- Worked on Git hub repository, branching, merging, etc
Environment: CDH5.0, HDP, Hadoop, HDFS, Pig, Hive, IMPALA, Solr, morphline, MapReduce, Sqoop, HBase, shell, Pentaho, Spark, Scala, Teradata Parallel load and fast export utility and parallel transporter, GitHub, storm, spark and Big Data
Confidential, Sunnyvale, CA
Big Data Engineer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Hive, and MapReduce
- Worked on debugging, performance tuning of Hive Jobs
- Migrating tables from RC format to ORC and data induction and other customized file formats.
- Wrote AutoSys jobs to schedule the reports
- Implemented test scripts to support test driven development and continuous integration
- Involved in loading data from LINUX file system to HDFS
- Supported MapReduce Programs those are running on the cluster
- Gained experience in managing and reviewing Hadoop log files
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs
- Worked on storm and Kafka to get the stream of JSon data
- Proof of concept on spark for interaction source transformations
- Used Apache Solr to search for specific products each cycle for the business
- Designing and documenting the project use cases, writing test cases, leading offshore team, and interacting with client
Environment: Hadoop, HDFS, Hive, MapReduce, Oozie, AutoSys, shell, Big Data, storm, Kafka, flume and spark
Confidential, Melville, NY
Hadoop Developer
Responsibilities:
- Analyzed large data sets by running Hive queries and Pig scripts
- Worked with the Data Science team to gather requirements for various data mining projects
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed Simple to complex MapReduce Jobs using Hive and Pig
- Involved in running Hadoop jobs for processing millions of records of text data
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing
- Involved in loading data from LINUX file system to HDFS
- Responsible for managing data from multiple sources
- Responsible to manage data coming from different sources
- Assisted in exporting analyzed data to relational databases (MySQL) using Sqoop
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
- Generating tableau reports and building dashboards
- Worked closely with business units to define development estimates according to Agile Methodology
- CDH 4.6: 48 nodes having each node of 3TB storage and 32GB ram
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Cassandra, LINUX, Tableau 8.2, shell scripting, and Big Data
Confidential, East Hartford, CT
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce on EC2
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Created Hbase tables to store various data formats of PII data coming from different portfolios
- Implemented test scripts to support test driven development and continuous integration
- Worked on tuning the performance Pig queries
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and HBase using Sqoop from MYSQL
- Experience working on processing semi-structured data using Pig and Hive
- Supported MapReduce Programs those are running on the cluster
- Gained experience in managing and reviewing Hadoop log and JSON files
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
Environment: Hadoop, HDFS, HBase, Pig, Hive, MapReduce, Sqoop, Oozie, LINUX, S3, EC2, AWS and Big Data
Confidential, St. Louis, MO
Java Developer
Responsibilities:
- Worked as part of the Agile Application Architecture (A3) development team responsible for setting up the architectural components for different layers of the application.
- Responsible for understanding the scope of the project and requirement gathering.
- Involved in analysis, design, construction and testing of the application
- Developed the web tier using JSP to show account details and summary
- Used web services Client for making calls to data. Generated Client classes using WSDL2Java and used the generated Java API
- Designed and developed the UI using JSP, HTML, CSS and JavaScript
- Utilized JPA for Object/Relational Mapping purposes for transparent persistence onto the SQL Server database.
- Used Tomcat web server for development purpose.
- Involved in creation of Test Cases for JUnit Testing.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Used CVS for version controlling.
- Developed application using Eclipse and used build and deploy tool as Maven.
- Used Log4J to print the logging, debugging, warning, info on the server console
Environment: Java, J2EE Servlet, JSP, JUnit, AJAX, XML, JavaScript, Log4j, CVS, Maven, Eclipse, Apache Tomcat, and Oracle
Confidential
Java Developer
Responsibilities:
- Providing immediate response to Production System calls and bug fixing with immediate effect.
- Setup of the planning application and all its components.
- Responsible for dimension building and outline maintenance using planning web interface and reflecting changes in Essbase.
- Loading metadata through HAL jobs.
- Creating web forms according to the requirement.
- Creating filters, groups and users.
- Providing access on the applications to the business users according to their business in shared services.
- Checking the drill through reports on daily basis.
- Resolving the Data-Reconciliation issues and other issues (mostly security) on UAT phase.
- Developed all the required detailed project documentation of all the tasks performed and the issues faced, which will be used for future reference.
- Assigning (dimension and data form) Access rights to users and user groups.
- Trained business analysts on retrieving data from Essbase cubes by using Excel spreadsheet add-in and some of the planning features such as alias table and Smart view offline process.
- Provided 24/7 technical support to all business units, offered suggestions and proposed solutions to resolve issues.
- Monitoring the automated data-load and aggregations using Appworx
Environment: Windows NT 2000/2003, XP, and Windows 7/ 8 C, Java, UNIX, and SQL, Hyperion Essbase 11.1.1.2, 9.3, Hyperion Excel Add-in, Smart View, Hyperion Planning 11.1.1.2,9.3, MDM 9.2.0.10.0, Microsoft Office Suit