We provide IT Staff Augmentation Services!

Big Data Consultant Resume

0/5 (Submit Your Rating)

Austin, TX

SUMMARY

  • 9 years of experience in Application development/Administration/Architecture and Data Analytics with specialization in Java and Big Data Technologies.
  • 4 Years of extensive experience as Hadoop Developer with strong expertise in MapReduce and Hive.
  • Strong hands on experience in designing optimized solutions using various Hadoop components like Mapreduce, Hive, Sqoop, Pig, HDFS, Flume, Oozie etc.
  • Strong Understanding of Cassandra and HBASE NoSQL Databases.
  • Actively involved in Requirement gathering, Analysis, Design, Reviews, Coding and Code Reviews.
  • Expertise in designing web applications using Java/J2ee in Agile scrum methodologies. Strong skills in writing UNIX and Linux scripts.
  • Experience in writing Python scripts.Expertise in JMS, IBM Web sphere MQ, Expertise in writing MDBs to listen to MQs.
  • Extensive experience with RAD 6.0, RSA, Web Sphere (WSAD5.1), Eclipse3.1.2, My eclipse and Oracle 9i.
  • Extensive experience in development of three tier & N - tier applications, Distributed Applications using J2EE Technologies.
  • Have strong analytical skills with proficiency in debugging, problem solving. Experience in Sizing and Scaling of Distribution Databases.
  • Experience in performing database consistency checks using DBCC utilities and Index Tuning Wizard
  • Experience in OLAP and ETL/Data warehousing, creating different data models and maintaining Data Marts
  • Experience in designing logical and physical database models using ERWIN
  • Implemented Kerberos in the Hadoop cluster environment. Kerberos is a security gateway to authenticate any user getting into the Hadoop cloud.
  • The Kerberos security system includes Key distribution center, Users and HDFS nodes as the components.
  • Have done implementation of these components in the system. Also handled tickets related to the Hadoop security system
  • Used to communicate with the Clodera team in case there is any critical issue that cannot be resolved very easily as the background of this Hadoop-Kerberos security environment is supported by the Cloudera team.

TECHNICAL SKILLS

Operating Systems: Windows 2000 Server, Windows 2000 Advanced Server, Windows Server 2003 Centos, Debian, Fedora, Windows NT, Windows 98/XP UNIX, Linux RHEL,DB2

Database: MSSQL Server 2000/2005/2008 , MS-Access, Tera Data, Oracle, Cassandra

Languages: JAVA, C, C++, PIG, HIVE, SSIS, SSAS, SSRS

Tools: /Utilities MapReduce,Sqoop,Flume,Oozie,SQLProfiler, Hbase, JENKINS,Stash, Agile,GIT

Reporting Tools: Tableau, Impala, Qlikview, Datameer Web Utilities HTTP, IIS Administration, APACHE

PROFESSIONAL EXPERIENCE

Confidential - Austin, TX

Big data Consultant

Responsibilities:

  • Installed and Configured multi-nodes fully distributed Hadoop cluster
  • Involved in Installing Hortonworks Hadoop Ecosystem components
  • Responsible to manage data coming from different sources and develop Hadoop production clusters.
  • Setup Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning and performance tuning
  • Written Complex Map reduce programs
  • Involved in designing, installations and maintenance of KAFKA and Ambari.
  • Loaded data into the cluster from dynamically generated files using FLUME and from RDBMS using Sqoop
  • Involved in writing Java API’s for interacting with HBase, building with Maven, JSP, Servlets, Web 2.0, Struts/Spring framework, Hibernate ORM, REST API’s, AngularJS
  • Involved in writing Flume and Hive scripts to extract, transform and load data into Database
  • Used Datalake as the data storage,
  • Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Experienced in Teradata services.
  • Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
  • Knowledge in performance troubleshooting and tuning Hadoop clusters.
  • Expert in Spark, Scala, storm, Hue and Samza.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Experienced in running AMAZON EMR.
  • Installed and configured Hive and also written Hive UDFs and Used Map Reduce and Junit for unit testing.
  • Experienced in working with various kinds of data sources such as Hortonworks Teradata and Oracle.
  • Worked on Data management and Data Integration.
  • Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts,.
  • Project leader of a team of 7 members
  • Technical expert in Hadoop Architecture guided the team and help them to solve the problems.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
  • Expertise in Kerberos, LDAP integration.
  • Very familiar with Data Visualization.
  • Familiar with parallel processing database like Terradata and the Netezza

Environment: Java, Hadoop, Hortonworks, Hive, Pig, Sqoop, Flume, HBase, Oracle 10g, Teradata, Cassandra, Scala, Spark, Netezza, Spring, Kakfa, AWS,Amazon EMR SSIS, SSRS, SSAS,Datalake

Confidential - Pittsburgh

Hadoop Developer

RESPONSIBILITIES:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data
  • Experience in Metadata.
  • Worked on Spark streaming, creating RDD, Graph Analytics.
  • Defining workflow using Oozie framework for automation.
  • Implemented Flume (Multiplexing) to steam data from upstream pipes in to HDFS.
  • Responsible for reviewing Hadoop log files.
  • Loading and transforming large sets of unstructured and semi structured data.
  • Performed data completeness, correctness, data transformation and data quality testing using SQL.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked on platforms like Kafka clusters
  • Implementation of Hive partition (static and dynamic) and bucketing.
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce and loaded data into HDFS.
  • Assisted in creation of ETL processes for transformation of data sources from existing RDBMS systems.
  • Developed profile/log interceptors for the struts action classes using Struts Action Invocation Framework (SAIF).
  • Written the Apache PIG scripts to process the HDFS data.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Involved in installing Hadoop Ecosystem components.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Installed and configured Hadoop, Map Reduce, HDFS.
  • Used Hive QL to do analysis on the data and identify different correlations.
  • Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Wrote Map Reduce job using Scala and Splunk.
  • Great understanding of REST architecture style and its application to well performing web sites for global usage.
  • Developer in Big Data team, worked with Hadoop AWS cloud, and its ecosystem.
  • Worked on Storm, Apache and Apex

Environment: Apache Hadoop, HDFS, Cloudera Manager, CentOS, Java, MapReduce, Eclipse, Hive, PIG,Sqoop, Oozie and SQL, Scala, Terraform and cloud formation, Hadoop AWS,SSIS,SSRS,SSAS

Confidential - Atlanta, GA

Hadoop Architect

Responsibilities:

  • Resolving User Support requests
  • Administer and Support Hadoop Clusters
  • Loaded data from RDBMS to Hadoop using Sqoop
  • Providing solutions to ETL/Data warehousing teams as to where to store the intermediate and final output file in the various layers in Hadoop
  • Worked collaboratively to manage build outs of large data clusters.
  • Helped design big data clusters and administered them.
  • Worked both independently and as an integral part of the development team.
  • Communicated all issues and participated in weekly strategy meetings.
  • Administered back end services and databases in the virtual environment.
  • Worked on Spark,Scala,Storm.
  • Implemented big data systems in cloud environments.
  • Created security and encryption systems for big data.
  • Performed administration troubleshooting and maintenance of ETL and ELT processes
  • Collaborated with multiple teams for design and implementation of big data clusters in cloud environments
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Developed and involved in the industry specific UDF (user defined functions)
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
  • Developed Hive queries to process the data for visualizing.
  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Developed a custom file system plugin for Hadoop to access files on data platform.
  • The custom file system plugin allows Hadoop Map Reduce programs, HBase, Pig, and Hive to access files directly.
  • Teradata vast knowledge experience.
  • Extracted feeds from social media sites such
  • Imported data using Sqoop to load data from Oracle to HDFS on a regular basis.
  • Developing scripts and batch jobs to schedule various Hadoop Programs.
  • Have written Hive Queries for data analysis to meet the business requirements.
  • Creating Hive Tables and working on them using Hive QL.

Environment: HDFS, Hive, ETL, PIG,UNIX, Linux, CDH 4 distribution, Tableau, Impala, Teradata, Pig, sqoop, flume, oozie

Confidential - Louisville, KY

Hadoop Admin/Architect

Responsibilities:

  • Installation and Configuration of Hadoop Cluster
  • Working with Cloudera Support Team to Fine tune Cluster
  • Working Closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources
  • Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform
  • Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly
  • The plugin also provided data locality for Hadoop across host nodes and virtual machines
  • Wrote data ingesters and map reduce programs
  • Developed map Reduce jobs to analyze data and provide heuristics reports
  • Extensive data validation using HIVE and also written Hive UDFs
  • Adding, Decommissioning and rebalancing nodes
  • Created POC to store Server Log data into Cassandra to identify System Alert Metrics
  • Rack Aware Configuration
  • Configuring Client Machines
  • Configuring, Monitoring and Management Tools
  • HDFS Support and Maintenance
  • Cluster HA Setup
  • Applying Patches and Perform Version Upgrades
  • Incident Management, Problem Management and Change Management
  • Performance Management and Reporting
  • Recover from Name Node failures
  • Schedule Map Reduce Jobs - FIFO and FAIR share
  • Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop
  • Integration with RDBMS using swoop and JDBC Connectors
  • Working with Dev Team to tune Job Knowledge of Writing Hive Jobs

Environment: Windows 2000/ 2003 UNIX Linux Java, Apache HDFS Map Reduce, Pig Hive HBase Flume Sqoop, Cassandra, NOSQL

Confidential - Auburn Hills, MI

SQL/Hadoop Developer

Responsibilities:

  • Developer Hadoop ecosystem: Hadoop, MapReduce, Hbase, Sqoop, Amazon Elastic Map Reduce (EMR)
  • Developed a scalable, cost effective, and fault tolerant data ware house system on Amazon EC2 Cloud.
  • Developed MapReduce/EMR jobs to analyze the data and provide heuristics and reports.
  • The heuristics were used for improving campaign targeting and efficiency
  • Importing, exporting data into HDFS and HIVE using Sqoop
  • Responsible for loading unstructured data into Hadoop file system (HDFS)
  • Created and scheduled jobs for maintenance
  • Configured Database Mail
  • Monitored File Growth
  • Maintained Operators, Categories, Alerts, Notifications, Jobs and Schedules
  • Maintained database response times, proactively generated performance reports
  • Automated most of the DBA Tasks and Monitoring stats
  • Developed complex stored procedures, views, clustered/non-clustered indexes, triggers (DDL, DML, LOGON) and user defined functions
  • Created a mirrored database using Database Mirroring with High Performance Mode
  • Created database snapshots and stored procedures to load data from the snapshot database to the report database
  • Restore Development and Staging databases from production as per the requirement
  • Involved in resolving Dead lock issues and Performance issues
  • Query Optimization and Performance Tuning for long running queries and created new indexes on tables for faster I/O

Environment: MS SQL Server 2005/2000, Windows 2000/2003 Server, DTS, Web Logic, Redhat Enterprise MS Access, XML, Hadoop, MapReduce, Hbase, Sqoop, Amazon Elastic Map Reduce CDH, Cassandra, NOSQL,Teradata

Confidential, IA

SQL/Linux Administrator

Responsibilities:

  • Installing, configuring Linux based systems
  • Installed, Configured and Maintained Supporting open source Linux operating systems (CENTOS, Debian, Fedora)
  • Monitoring the health and stability of Linux and Windows System environments
  • Diagnosed and resolved problems associated with DNS, DHCP, VPN, NFS, and Apache
  • Scripting expertise including BASH, PHP, PERL, Java script and UNIX Shell
  • Maintained and Monitored Replication by managing the profile parameters
  • Implemented Log Shipping and Database Mirroring
  • Used BCP Utility and Bulk Insert for bulk operations on data
  • Automated and enhanced daily administrative tasks including disk space management Backup and recovery
  • Used DTS and SSIS to Import and Export various forms of data
  • Performance Tuning, capacity planning, Server Partitioning and Database security Configuration are done on regular basis to maintain the consistency
  • Created alerts and notifications to notify system errors
  • Used SQL Server Profiler for troubleshooting, monitoring and optimization of SQL Server
  • Worked with developers in creation of Stored Procedures, triggers and User Defined Functions to handle the complex business rules data and audit analysis
  • Provided 24X7 on call Support
  • Generated reports daily, weekly and monthly reports

Confidential

SQL Server Admin

Responsibilities:

  • To set up SQL Server configuration settings.
  • Export or Import data from other data sources like flat files using Import/Export of DTS.
  • Back up, package and distribute databases more efficiently by using Red gate
  • Automate common tasks and use functionality in applications by using Red gate
  • Rebuilding the indexes at regular intervals for better performance
  • Designed and implemented comprehensive Backup plan and disaster recovery strategies
  • Involved in trouble shooting and fine-tuning of databases for its performance and concurrency.
  • Monitored and modified Performance using execution plans and Index tuning.
  • Manage the clustered environment.
  • Using log shipping for synchronization of database.
  • Implementation of SQL Logins, Roles and Authentication Modes as a part of Security Policies for various categories of User Support.
  • Monitoring SQL server performance using profiler to find performance and deadlocks.
  • Maintaining the database consistency with DBCC at regular intervals

We'd love your feedback!