Big Data Consultant Resume
Weehawken, NJ
SUMMARY:
- Over 6 years professional IT experience which includes 3 years in Hadoop ecosystems and continuous working experience in Java
- Cloudera Certificated Spark & Hadoop developer (CCA175)
- Technical experience in manufacturing, finance and internet industry
- Proficient in Java, Python, Scala and R
- Working knowledge in NoSQL storage, such as Hive, Hbase, Cassandra, Redis, Cassandra, MongoDB, Impala
- Exposed to setting up and maintaining Hadoop cluster on YARN.
- Expert in importing and exporting data using Sqoop from HDFS to Relational Database Systems (Oracle/MySQL) and vice - versa
- Involved in moving log files generated from various sources to HDFS for further processing through Flume
- Experienced in building real time high throughput streaming service to transport high volume data using Kafka
- Strong expertise in building traditional ETL pipelines using Informatica best practices
- Capable of writing custom UDFs in Java for HIVE and Pig Latin to extend functionality
- Experienced in job, workflow scheduling and monitoring tools like Oozie, Appworx
- Developed data analysis and visualization using SQL, R, HiveQL, Spark SQL and Tableau.
- Experienced in working with Apache Spark streaming API for near real time data processing
- Skillful in Data Validation, Cleansing, Verification and identifying data mismatch
- Experienced in writing custom MapReduce programs in Java
- Familiar with Machine Learning and Statistical Analysis using R, Python and Spark
- Knowledge in Machine Learning Framework including Scikit-learn, NLTK and MLlib
- Algorithms including K-Means, KNN, Regression, SVM and Neural Network
- Extensive experienced in writing complex SQL queries using Oracle Analytic Functions
- Strong Database Experience in PL/SQL database programming to create Packages, Stored Procedures, Functions, Triggers, Index, Materialized Views and Cursors
- Clear understanding of theories in ER modeling for OLTP and dimension modeling for OLAP
- Strong in core java, data structure, algorithms design, Object-Oriented Design(OOD) and Java components like Collections Framework, Exception handling, I/O system, and Multithreading
- Hands on experience in MVC architecture and Java EE frameworks like Struts, Spring MVC, and Hibernate.
- Experienced in interacting with Clients, Business Analysts, IT leads, UAT Users and developers
- Exposed to Agile Development environment, tools and methodologies
- Authorized to work in the US for any employee
TECHNICAL SKILLS:
Hadoop Ecosystem\ Web Development: Hadoop2.0, Spark2.0, MapReduce, Pig0.15+ Hibernate, HTML, CSS, AJAX, Bootstrap, J2EE Hive, Sqoop, Flume, Kafka1.0+, Zookeeper3.0+\ Spring MVC, Node.js, Django Hbase, Oozie, Storm1.0
Programming Language\ Data Analysis & Visualization\: Java, Scala, Python, JavaScript, PL/SQL\ Python, R, SQL, Tableau, D3.js
Cloud Platform\ Scripting Language\: Amazon Web Service, Heroku\ UNIX Shell, HTML, XML, CSS, JSP
Operating Systems\ Environment\: Mac OS, Ubuntu, CentOS, Windows\ Agile, Scrum, TDD, JIRA, Confluence, Jenkins
Machine Learning Algorithm\ Database\: Linear Regression, Logistic Regression, MySQL, Oracle 11g, Exadata, PostgreSQL9.x, Decision, Tree, Neural Network, K Means, \ SQL Server 2012, 2016, MongoDB 3.2, HBase KNN, Support, Vector Machine\ 0.98, Cassandra3.0, Redis3.2
Others\ Machine Learning Framework: Docker, Informatica9.0, SSIS \ Spark MLlib, SciPy, Matplotlib, Pandas, Numpy
PROFESSIONAL EXPERIENCE:
Confidential, Weehawken, NJ
Big Data Consultant
Responsibilities:
- Integrated data from relational database (Oracle, MySQL, SQL Server) to HDFS using Sqoop
- Configured flume agents to collect real time logging data from application servers
- Implemented reliable and scalable Kafka message system for high throughput data ingestion
- Wrote MapReduce programs in Java for offline batch processing
- Created multiple Hive tables with partitioning and bucketing for efficient data access
- Processed stream data using Spark Streaming for risk evaluation and product recommendation
- Cached key result from streaming and batch processing system in Redis for fast access
- Involved in Analyzing time series data using Spark MLlib, Scipy, Matplotlib
- Evaluated model accuracy and tune parameters with offline simulation data
- Automated workflows using Oozie and shell Scripts
Environment: Hadoop 2.6, Cloudera CDH 5X, Kafka, Sqoop, Flume, Zookeeper, Spark 2.0, Scala, Redis, Oozie, Shell script, Oracle, MySQL, SQL Server, Python, Java
Confidential, Erie, PA
Senior Data Warehouse Consultant
Responsibilities:
- Integrated structured data from various portfolios of sources to Hive using Sqoop
- Ingested large amount of semi-structured Application data using Flume in real time
- Stored data in Kafka cluster as a central buffer
- Developed MapReduce jobs for data cleaning, validation and categorization
- Built operational data store in Hive for raw data
- Created Fact/Dim/Bridge tables with Star Schema by Kimball Approach
- Developed periodic analytic/ aggregation queries and saved results in Hbase for fast access
- Wrote Hive UDFs for data transformation and aggregation
- Used Informatica, SSIS to integrate Oracle E-business suite with Exadata and SQL Server
- Involved in building business intelligence dashboards using tools like Tablea, OBIEE
- Drafted shell scripts for job execution and scheduling using Appworx and Oozie
- Supported production Data Lake in terms of data, accuracy, consistency and performance
- Worked by Agile/SCRUM methodologies
Environment: Hadoop 2.4, Hive 0.12, Sqoop, Flume, Kafka, Hbase, Oozie, Appworx, Tableau, OBIEE, MySQL, SQL Server2012, 2016, Informatica9.0, Java, SQL, PL/SQL, Shell script, Exadata, Agile
Confidential
Hadoop developer
Responsibilities:
- Cleaned data using Map Reduce programs in Java for data cleaning and categorization
- Wrote shell script to manipulate files on application servers
- Used Flume to collect, aggregate and store log data from different sources
- Built Informatica workflows to capture change in application database
- Captured streaming data by Kafka and do real time analysis using Storm
- Stored analysis from streaming data in Hbase for responsive ad hoc query
- Created thousands Fact/Dim tables in Hive with partitioning and bucketing for efficient access
- Wrote HiveQL scripts for data analysis and exploration
- Extracted data using Sqoop from HDFS to MySQL for business intelligence team
- Worked with analytics team to prepare and visualize results in Tableau for reporting
- Used Oozie to orchestrate the MapReduce jobs in order to setup automated workflow
Environment: Hadoop, Java, HDFS, Flume, Hive, MapReduce, Sqoop, HQL, Eclipse, MySQL, Tableau, D3, Hbase, Kafka, Spark
Confidential
Hadoop developer
Responsibilities:
- Used Sqoop to import data from Oracle and MySQL to Hive
- Wrote HiveQL queries to retrieve and analyze the Hive storage
- Used Flume to stream the log data and social media JSON Format data from sources.
- Developed MapReduce programs, Hive SerDes to clean and parse data in HDFS obtained from various data sources.
- Used Oozie to orchestrate the MapReduce jobs in order to setup automated workflow
- Collected high throughput data using Kafka and analyzed by Spark
- Applied Logistic Regression algorithm to build the model
- Visualized the data with Python Matplotlib, Tableau and D3.js
Environment: Hadoop 2.2, Spark, MapReduce, MySQL, Oracle SQL, Hive, Sqoop, Tableau, Matplotlib, D3.js
Confidential
Oracle E-Business-Suite Developer
Responsibilities:
- Develop various Oracle Applications using PL/SQL , SQL*Plus , Forms, Reports, Workflow Builder, and Application Object Library
- Use Oracle E-Business Suite applications for accounts payable/accounts receivable, general ledger and cash management supporting data extraction, integration, filtering, and validation
- Data integration from legacy system into Oracle E-Business Suite 11i
- Maintain and build workflow extension to support new business process
- Build customized data quality checking jobs to maintain data quality between data warehouse and source system in ETL process using Informatica
- Improve production performance through determining bottlenecks like implementing database partitioning and increasing block size, data cache size, sequence buffer length, target based commit interval and SQL overrides
Environment: Linux, Java, Shell script, SQL, PL/SQL, Oracle form/report builder, Oracle E-Business Suite 11i, Informatica
Confidential
Backend Developer
Responsibilities:
- Developed the data parsing system on XML/JSON
- Involved in system design, which is based on Spring Struts Hibernate framework.
- Worked in Spring Hibernate Template to access the MySQL database.
- Involved in Unit testing of the components and created unit test cases and did unit test review
- Developed online data analysis system by using IBM Cognos
- Developed and fine-tuned database(MySQL)
- Optimize current data models and design diagrams
- Ensured data integrity and detected data errors and misuse
Environment: J2SE/J2EE 5.0, JSP, HTML, JavaScript, JDBC, Eclipse, IBM Cognos, IBM DataStage, MySQL, MySQL Workbench, Toad, Linux, shell script