Sql/hadoop Developer Resume
Auburn Hills, MI
SUMMARY
- 9 years of experience in Application development/Administration/Architecture and Data Analytics with specialization in Java and Big Data Technologies.
- 4 Years of extensive experience as Hadoop Developer with strong expertise in MapReduce and Hive.
- Strong hands on experience in designing optimized solutions using various Hadoop components like Mapreduce, Hive, Sqoop, Pig, HDFS, Flume, Oozie etc.
- Strong Understanding of Cassandra and HBASE NoSQL Databases.
- Actively involved in Requirement gathering, Analysis, Design, Reviews, Coding and Code Reviews.
- Expertise in designing web applications using Java/J2ee in Agile scrum methodologies. Strong skills in writing UNIX and Linux scripts.
- Experience in writing Python scripts.Expertise in JMS, IBM Web sphere MQ, Expertise in writing MDBs to listen to MQs.
- Extensive experience with RAD 6.0, RSA, Web Sphere (WSAD5.1), Eclipse3.1.2, My eclipse and Oracle 9i.
- Extensive experience in development of three tier & N - tier applications, Distributed Applications using J2EE Technologies.
- Have strong analytical skills with proficiency in debugging, problem solving. Experience in Sizing and Scaling of Distribution Databases.
- Experience in performing database consistency checks using DBCC utilities and Index Tuning Wizard
- Experience in OLAP and ETL/Data warehousing, creating different data models and maintaining Data Marts
- Experience in designing logical and physical database models using ERWIN
- Implemented Kerberos in the Hadoop cluster environment. Kerberos is a security gateway to authenticate any user getting into the Hadoop cloud.
- The Kerberos security system includes Key distribution center, Users and HDFS nodes as the components.
- Have done implementation of these components in the system. Also handled tickets related to the Hadoop security system
- Used to communicate with the Clodera team in case there is any critical issue that cannot be resolved very easily as the background of this Hadoop-Kerberos security environment is supported by the Cloudera team.
TECHNICAL SKILLS
Operating Systems: Windows 2000 Server, Windows 2000 Advanced Server, Windows Server 2003 Centos, Debian, Fedora, Windows NT, Windows 98/XP UNIX, Linux RHEL,DB2
Database: MSSQL Server 2000/2005/2008, MS-Access, Tera Data, Oracle, Cassandra
Languages: JAVA, C, C++, PIG, HIVE, SSIS, SSAS, SSRS
Tools: /Utilities MapReduce,Sqoop,Flume,Oozie,SQLProfiler, Hbase, JENKINS,Stash, Agile,GIT
Reporting Tools: Tableau, Impala, Qlikview, Datameer Web Utilities HTTP, IIS Administration, APACHE
PROFESSIONAL EXPERIENCE
Confidential - Austin, TX
Big data Consultant
Responsibilities:
- Installed and Configured multi-nodes fully distributed Hadoop cluster
- Involved in Installing Hortonworks Hadoop Ecosystem components
- Responsible to manage data coming from different sources and develop Hadoop production clusters.
- Setup Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning and performance tuning
- Written Complex Map reduce programs
- Involved in designing, installations and maintenance of KAFKA and Ambari.
- Loaded data into the cluster from dynamically generated files using FLUME and from RDBMS using Sqoop
- Involved in writing Java API’s for interacting with HBase, building with Maven, JSP, Servlets, Web 2.0, Struts/Spring framework, Hibernate ORM, REST API’s, AngularJS
- Involved in writing Flume and Hive scripts to extract, transform and load data into Database
- Used Datalake as the data storage,
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Experienced in Teradata services.
- Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
- Knowledge in performance troubleshooting and tuning Hadoop clusters.
- Expert in Spark, Scala, storm, Hue and Samza.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Experienced in running AMAZON EMR.
- Installed and configured Hive and also written Hive UDFs and Used Map Reduce and Junit for unit testing.
- Experienced in working with various kinds of data sources such as Hortonworks Teradata and Oracle.
- Worked on Data management and Data Integration.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts,.
- Project leader of a team of 7 members
- Technical expert in Hadoop Architecture guided the team and help them to solve the problems.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
- Expertise in Kerberos, LDAP integration.
- Very familiar with Data Visualization.
- Familiar with parallel processing database like Terradata and the Netezza
Environment: Java, Hadoop, Hortonworks, Hive, Pig, Sqoop, Flume, HBase, Oracle 10g, Teradata, Cassandra, Scala, Spark, Netezza, Spring, Kakfa, AWS,Amazon EMR SSIS, SSRS, SSAS,Datalake
Confidential - Pittsburgh
Hadoop Developer
RESPONSIBILITIES:
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data
- Experience in Metadata.
- Worked on Spark streaming, creating RDD, Graph Analytics.
- Defining workflow using Oozie framework for automation.
- Implemented Flume (Multiplexing) to steam data from upstream pipes in to HDFS.
- Responsible for reviewing Hadoop log files.
- Loading and transforming large sets of unstructured and semi structured data.
- Performed data completeness, correctness, data transformation and data quality testing using SQL.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked on platforms like Kafka clusters
- Implementation of Hive partition (static and dynamic) and bucketing.
- Handled importing of data from various data sources, performed transformations using Hive, Map Reduce and loaded data into HDFS.
- Assisted in creation of ETL processes for transformation of data sources from existing RDBMS systems.
- Developed profile/log interceptors for the struts action classes using Struts Action Invocation Framework (SAIF).
- Written the Apache PIG scripts to process the HDFS data.
- Wrote Hive queries for data analysis to meet the business requirements.
- Involved in installing Hadoop Ecosystem components.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Installed and configured Hadoop, Map Reduce, HDFS.
- Used Hive QL to do analysis on the data and identify different correlations.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Installed and configured Pig and also written Pig Latin scripts.
- Wrote Map Reduce job using Scala and Splunk.
- Great understanding of REST architecture style and its application to well performing web sites for global usage.
- Developer in Big Data team, worked with Hadoop AWS cloud, and its ecosystem.
- Worked on Storm, Apache and Apex
Environment: Apache Hadoop, HDFS, Cloudera Manager, CentOS, Java, MapReduce, Eclipse, Hive, PIG,Sqoop, Oozie and SQL, Scala, Terraform and cloud formation, Hadoop AWS,SSIS,SSRS,SSAS
Confidential - Atlanta, GA
Hadoop Architect
Responsibilities:
- Resolving User Support requests
- Administer and Support Hadoop Clusters
- Loaded data from RDBMS to Hadoop using Sqoop
- Providing solutions to ETL/Data warehousing teams as to where to store the intermediate and final output file in the various layers in Hadoop
- Worked collaboratively to manage build outs of large data clusters.
- Helped design big data clusters and administered them.
- Worked both independently and as an integral part of the development team.
- Communicated all issues and participated in weekly strategy meetings.
- Administered back end services and databases in the virtual environment.
- Worked on Spark,Scala,Storm.
- Implemented big data systems in cloud environments.
- Created security and encryption systems for big data.
- Performed administration troubleshooting and maintenance of ETL and ELT processes
- Collaborated with multiple teams for design and implementation of big data clusters in cloud environments
- Developed PIG Latin scripts for the analysis of semi structured data.
- Developed and involved in the industry specific UDF (user defined functions)
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
- Developed Hive queries to process the data for visualizing.
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Developed a custom file system plugin for Hadoop to access files on data platform.
- The custom file system plugin allows Hadoop Map Reduce programs, HBase, Pig, and Hive to access files directly.
- Teradata vast knowledge experience.
- Extracted feeds from social media sites such
- Imported data using Sqoop to load data from Oracle to HDFS on a regular basis.
- Developing scripts and batch jobs to schedule various Hadoop Programs.
- Have written Hive Queries for data analysis to meet the business requirements.
- Creating Hive Tables and working on them using Hive QL.
Environment: HDFS, Hive, ETL, PIG,UNIX, Linux, CDH 4 distribution, Tableau, Impala, Teradata, Pig, sqoop, flume, oozie
Confidential - Louisville, KY
Hadoop Admin/Architect
Responsibilities:
- Installation and Configuration of Hadoop Cluster
- Working with Cloudera Support Team to Fine tune Cluster
- Working Closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources
- Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform
- Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly
- The plugin also provided data locality for Hadoop across host nodes and virtual machines
- Wrote data ingesters and map reduce programs
- Developed map Reduce jobs to analyze data and provide heuristics reports
- Extensive data validation using HIVE and also written Hive UDFs
- Adding, Decommissioning and rebalancing nodes
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics
- Rack Aware Configuration
- Configuring Client Machines
- Configuring, Monitoring and Management Tools
- HDFS Support and Maintenance
- Cluster HA Setup
- Applying Patches and Perform Version Upgrades
- Incident Management, Problem Management and Change Management
- Performance Management and Reporting
- Recover from Name Node failures
- Schedule Map Reduce Jobs - FIFO and FAIR share
- Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop
- Integration with RDBMS using swoop and JDBC Connectors
- Working with Dev Team to tune Job Knowledge of Writing Hive Jobs
Environment: Windows 2000/ 2003 UNIX Linux Java, Apache HDFS Map Reduce, Pig Hive HBase Flume Sqoop, Cassandra, NOSQL
Confidential - Auburn Hills, MI
SQL/Hadoop Developer
Responsibilities:
- Developer Hadoop ecosystem: Hadoop, MapReduce, Hbase, Sqoop, Amazon Elastic Map Reduce (EMR)
- Developed a scalable, cost effective, and fault tolerant data ware house system on Amazon EC2 Cloud.
- Developed MapReduce/EMR jobs to analyze the data and provide heuristics and reports.
- The heuristics were used for improving campaign targeting and efficiency
- Importing, exporting data into HDFS and HIVE using Sqoop
- Responsible for loading unstructured data into Hadoop file system (HDFS)
- Created and scheduled jobs for maintenance
- Configured Database Mail
- Monitored File Growth
- Maintained Operators, Categories, Alerts, Notifications, Jobs and Schedules
- Maintained database response times, proactively generated performance reports
- Automated most of the DBA Tasks and Monitoring stats
- Developed complex stored procedures, views, clustered/non-clustered indexes, triggers (DDL, DML, LOGON) and user defined functions
- Created a mirrored database using Database Mirroring with High Performance Mode
- Created database snapshots and stored procedures to load data from the snapshot database to the report database
- Restore Development and Staging databases from production as per the requirement
- Involved in resolving Dead lock issues and Performance issues
- Query Optimization and Performance Tuning for long running queries and created new indexes on tables for faster I/O
Environment: MS SQL Server 2005/2000, Windows 2000/2003 Server, DTS, Web Logic, Redhat Enterprise MS Access, XML, Hadoop, MapReduce, Hbase, Sqoop, Amazon Elastic Map Reduce CDH, Cassandra, NOSQL,Teradata
Confidential, IA
SQL/Linux Administrator
Responsibilities:
- Installing, configuring Linux based systems
- Installed, Configured and Maintained Supporting open source Linux operating systems (CENTOS, Debian, Fedora)
- Monitoring the health and stability of Linux and Windows System environments
- Diagnosed and resolved problems associated with DNS, DHCP, VPN, NFS, and Apache
- Scripting expertise including BASH, PHP, PERL, Java script and UNIX Shell
- Maintained and Monitored Replication by managing the profile parameters
- Implemented Log Shipping and Database Mirroring
- Used BCP Utility and Bulk Insert for bulk operations on data
- Automated and enhanced daily administrative tasks including disk space management Backup and recovery
- Used DTS and SSIS to Import and Export various forms of data
- Performance Tuning, capacity planning, Server Partitioning and Database security Configuration are done on regular basis to maintain the consistency
- Created alerts and notifications to notify system errors
- Used SQL Server Profiler for troubleshooting, monitoring and optimization of SQL Server
- Worked with developers in creation of Stored Procedures, triggers and User Defined Functions to handle the complex business rules data and audit analysis
- Provided 24X7 on call Support
- Generated reports daily, weekly and monthly reports
Confidential
SQL Server Admin
Responsibilities:
- To set up SQL Server configuration settings.
- Export or Import data from other data sources like flat files using Import/Export of DTS.
- Back up, package and distribute databases more efficiently by using Red gate
- Automate common tasks and use functionality in applications by using Red gate
- Rebuilding the indexes at regular intervals for better performance
- Designed and implemented comprehensive Backup plan and disaster recovery strategies
- Involved in trouble shooting and fine-tuning of databases for its performance and concurrency.
- Monitored and modified Performance using execution plans and Index tuning.
- Manage the clustered environment.
- Using log shipping for synchronization of database.
- Implementation of SQL Logins, Roles and Authentication Modes as a part of Security Policies for various categories of User Support.
- Monitoring SQL server performance using profiler to find performance and deadlocks.
- Maintaining the database consistency with DBCC at regular intervals