Sql/hadoop Developer Resume
Louisville, KY
SUMMARY:
- 9 years of experience in Application development/Administration/Architecture and Data Analytics with specialization in Java and Big Data Technologies. 4 Years of extensive experience as Hadoop Developer with strong expertise in MapReduce and Hive.
- Strong hands on experience in designing optimized solutions using various Hadoop components like Mapreduce, Hive, Sqoop, Pig, HDFS, Flume, Oozie etc.
- Strong Understanding of Cassandra and HBASE NoSQL Databases.
- Actively involved in Requirement gathering, Analysis, Design, Reviews, Coding and Code Reviews.Expertise in designing web applications using Java/J2ee in Agile scrum methodologies. Strong skills in writing UNIX and Linux scripts.
- Experience in writing Python scripts.Expertise in JMS, IBM Web sphere MQ, Expertise in writing MDBs to listen to MQs. Extensive experience with RAD 6.0, RSA, Web Sphere (WSAD5.1), Eclipse3.1.2, My eclipse and Oracle 9i.
- Extensive experience in development of three tier & N - tier applications, Distributed Applications using J2EE Technologies.Have strong analytical skills with proficiency in debugging, problem solving. Experience in Sizing and Scaling of Distribution Databases. Experience in performing database consistency checks using DBCC utilities and Index Tuning Wizard
- Experience in OLAP and ETL/Data warehousing, creating different data models and maintaining Data Marts
- Experience in designing logical and physical database models using ERWIN
- Implemented Kerberos in the Hadoop cluster environment. Kerberos is a security gateway to authenticate any user getting into the Hadoop cloud.
- The Kerberos security system includes Key distribution center, Users and HDFS nodes as the components. Have done implementation of these components in the system. Also handled tickets related to the Hadoop security system
- Used to communicate with the Clodera team in case there is any critical issue that cannot be resolved very easily as the background of this Hadoop-Kerberos security environment is supported by the Cloudera team.
TECHNICAL SKILLS:
Operating Systems: Windows 2000 Server, Windows 2000 Advanced Server, Windows Server 2003 Centos, Debian, Fedora, Windows NT, Windows 98/XP UNIX, Linux RHEL,DB2
Database: MSSQL Server 2000/2005/2008, MS-Access, Tera Data, Oracle, Cassandra
Languages: JAVA, C, C++, PIG, HIVE, SSIS, SSAS, SSRS
Tools: /Utilities: MapReduce,Sqoop,Flume,Oozie,SQLProfiler, Hbase, JENKINS,Stash, Agile,GIT
Reporting Tools: Tableau, Impala, Qlikview, Datameer
Web Utilities: HTTP, IIS Administration, APACHE
PROFESSIONAL EXPERIENCE:
Confidential,Austin, TX
Big data Consultant
Responsibilities:- Installed and Configured multi-nodes fully distributed Hadoop cluster
- Involved in Installing Hortonworks Hadoop Ecosystem components
- Responsible to manage data coming from different sources and develop Hadoop production clusters.
- Setup Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning and performance tuning
- Written Complex Map reduce programs
- Involved in designing, installations and maintenance of KAFKA and Ambari.
- Worked on Metadata.
- Loaded data into the cluster from dynamically generated files using FLUME and from RDBMS using Sqoop
- Involved in writing Java API’s for interacting with HBase, building with Maven, JSP, Servlets, Web 2.0, Struts/Spring framework, Hibernate ORM, REST API’s, AngularJS
- Involved in writing Flume and Hive scripts to extract, transform and load data into Database
- Used Datalake as the data storage,
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Experienced in Teradata services.
- Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
- Knowledge in performance troubleshooting and tuning Hadoop clusters.
- Expert in Spark, Scala, storm, Hue and Samza.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Experienced in running AMAZON EMR.
- Installed and configured Hive and also written Hive UDFs and Used Map Reduce and Junit for unit testing.
- Experienced in working with various kinds of data sources such as Hortonworks Teradata and Oracle.
- Worked on Data management and Data Integration.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts,.
- Project leader of a team of 7 members
- Technical expert in Hadoop Architecture guided the team and help them to solve the problems.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
- Expertise in Kerberos, LDAP integration.
- Very familiar with Data Visualization.
- Familiar with parallel processing database like Terradata and the Netezza
Environment: Java, Hadoop, Hortonworks, Hive, Pig, Sqoop, Flume, HBase, Oracle 10g, Teradata, Cassandra, Scala, Spark, Netezza, Spring, Kakfa, AWS,Amazon EMR SSIS, SSRS, SSAS,Datalake
Confidential
Hadoop Developer
Responsibilities:- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data
- Experience in Metadata.
- Worked on Spark streaming, creating RDD, Graph Analytics.
- Defining workflow using Oozie framework for automation.
- Implemented Flume (Multiplexing) to steam data from upstream pipes in to HDFS.
- Responsible for reviewing Hadoop log files.
- Loading and transforming large sets of unstructured and semi structured data.
- Performed data completeness, correctness, data transformation and data quality testing using SQL.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked on platforms like Kafka clusters
- Implementation of Hive partition (static and dynamic) and bucketing.
- Handled importing of data from various data sources, performed transformations using Hive, Map Reduce and loaded data into HDFS.
- Assisted in creation of ETL processes for transformation of data sources from existing RDBMS systems.
- Involved in Teradata.
- Developed profile/log interceptors for the struts action classes using Struts Action Invocation Framework (SAIF).
- Written the Apache PIG scripts to process the HDFS data.
- Wrote Hive queries for data analysis to meet the business requirements.
- Involved in installing Hadoop Ecosystem components.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Installed and configured Hadoop, Map Reduce, HDFS.
- Used Hive QL to do analysis on the data and identify different correlations.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Installed and configured Pig and also written Pig Latin scripts.
- Wrote Map Reduce job using Scala and Splunk.
- Great understanding of REST architecture style and its application to well performing web sites for global usage.
- Developer in Big Data team, worked with Hadoop AWS cloud, and its ecosystem.
- Worked on Storm, Apache and Apex
Environment: Apache Hadoop, HDFS, Cloudera Manager, CentOS, Java, MapReduce, Eclipse, Hive, PIG,Sqoop, Oozie and SQL, Scala, Terraform and cloud formation, Hadoop AWS,SSIS,SSRS,SSAS
Confidential,Atlanta, GA
Hadoop Architect
Responsibilities:- Resolving User Support requests
- Administer and Support Hadoop Clusters
- Loaded data from RDBMS to Hadoop using Sqoop
- Providing solutions to ETL/Data warehousing teams as to where to store the intermediate and final output file in the various layers in Hadoop
- Worked collaboratively to manage build outs of large data clusters.
- Helped design big data clusters and administered them.
- Worked both independently and as an integral part of the development team.
- Communicated all issues and participated in weekly strategy meetings.
- Administered back end services and databases in the virtual environment.
- Worked on Spark,Scala,Storm.
- Implemented big data systems in cloud environments.
- Created security and encryption systems for big data.
- Performed administration troubleshooting and maintenance of ETL and ELT processes
- Collaborated with multiple teams for design and implementation of big data clusters in cloud environments
- Developed PIG Latin scripts for the analysis of semi structured data.
- Developed and involved in the industry specific UDF (user defined functions)
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
- Developed Hive queries to process the data for visualizing.
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Developed a custom file system plugin for Hadoop to access files on data platform.
- The custom file system plugin allows Hadoop Map Reduce programs, HBase, Pig, and Hive to access files directly.
- Teradata vast knowledge experience.
- Extracted feeds from social media sites such
- Imported data using Sqoop to load data from Oracle to HDFS on a regular basis.
- Developing scripts and batch jobs to schedule various Hadoop Programs.
- Have written Hive Queries for data analysis to meet the business requirements.
- Creating Hive Tables and working on them using Hive QL.
Environment: HDFS, Hive, ETL, PIG,UNIX, Linux, CDH 4 distribution, Tableau, Impala, Teradata, Pig, sqoop, flume, oozie
Confidential,Louisville, KY
Hadoop Admin/Architect
Responsibilities:- Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects
- Installation and Configuration of Hadoop Cluster
- Working with Cloudera Support Team to Fine tune Cluster
- Working Closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources
- Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform
- Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly
- The plugin also provided data locality for Hadoop across host nodes and virtual machines
- Wrote data ingesters and map reduce programs
- Developed map Reduce jobs to analyze data and provide heuristics reports
- Good experience in writing data ingesters and complex MapReduce jobs in java for data cleaning and preprocessing and fine tuning them as per data sets
- Extensive data validation using HIVE and also written Hive UDFs
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way lots of scripting (python and shell) to provision and spin up virtualized hadoop clusters
- Adding, Decommissioning and rebalancing nodes
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics
- Rack Aware Configuration
- Configuring Client Machines
- Configuring, Monitoring and Management Tools
- HDFS Support and Maintenance
- Cluster HA Setup
- Applying Patches and Perform Version Upgrades
- Incident Management, Problem Management and Change Management
- Performance Management and Reporting
- Recover from Name Node failures
- Schedule Map Reduce Jobs - FIFO and FAIR share
- Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop
- Integration with RDBMS using swoop and JDBC Connectors
- Working with Dev Team to tune Job Knowledge of Writing Hive Jobs
Environment: Windows 2000/ 2003 UNIX Linux Java, Apache HDFS Map Reduce, Pig Hive HBase Flume Sqoop, Cassandra, NOSQL
Confidential,Auburn Hills, MI
SQL/Hadoop Developer
Responsibilities:- Developer Hadoop ecosystem: Hadoop, MapReduce, Hbase, Sqoop, Amazon Elastic Map Reduce (EMR)
- Developed a scalable, cost effective, and fault tolerant data ware house system on Amazon EC2 Cloud.
- Developed MapReduce/EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency
- Response time for web services built on typical LAMP (php) stack was too slow developed a high performant / high volume / highly scalable platform for bidding in real-time understand from the client the extraction process and decide on the load strategy i.e. whether they want historical data or the current view
- Written complex HSQL’s to generate data required in the final reports and pass these HSQL’s to the Ruby programs to convert these HSQL’s to map Reduce programs
- Importing, exporting data into HDFS and HIVE using Sqoop
- Responsible for loading unstructured data into Hadoop file system (HDFS)
- Created and scheduled jobs for maintenance
- Configured Database Mail
- Monitored File Growth
- Maintained Operators, Categories, Alerts, Notifications, Jobs and Schedules
- Maintained database response times, proactively generated performance reports
- Automated most of the DBA Tasks and Monitoring stats
- Developed complex stored procedures, views, clustered/non-clustered indexes, triggers (DDL, DML, LOGON) and user defined functions
- Created a mirrored database using Database Mirroring with High Performance Mode
- Created database snapshots and stored procedures to load data from the snapshot database to the report database
- Restore Development and Staging databases from production as per the requirement
- Involved in resolving Dead lock issues and Performance issues
- Query Optimization and Performance Tuning for long running queries and created new indexes on tables for faster I/O
Environment: MS SQL Server 2005/2000, Windows 2000/2003 Server, DTS, Web Logic, Redhat Enterprise MS Access, XML, Hadoop, MapReduce, Hbase, Sqoop, Amazon Elastic Map Reduce CDH, Cassandra, NOSQL,Teradata
Confidential,IA
SQL/Linux Administrator
Responsibilities:- Installing, configuring Linux based systems
- Installed, Configured and Maintained Supporting open source Linux operating systems (CENTOS, Debian, Fedora)
- Monitoring the health and stability of Linux and Windows System environments
- Diagnosed and resolved problems associated with DNS, DHCP, VPN, NFS, and Apache
- Scripting expertise including BASH, PHP, PERL, Java script and UNIX Shell
- Maintained and Monitored Replication by managing the profile parameters
- Implemented Log Shipping and Database Mirroring
- Used BCP Utility and Bulk Insert for bulk operations on data
- Automated and enhanced daily administrative tasks including disk space management Backup and recovery
- Used DTS and SSIS to Import and Export various forms of data
- Performance Tuning, capacity planning, Server Partitioning and Database security Configuration are done on regular basis to maintain the consistency
- Created alerts and notifications to notify system errors
- Used SQL Server Profiler for troubleshooting, monitoring and optimization of SQL Server
- Worked with developers in creation of Stored Procedures, triggers and User Defined Functions to handle the complex business rules data and audit analysis
- Provided 24X7 on call Support
- Generated reports daily, weekly and monthly reports
Confidential
SQL Server Admin
Responsibilities:- To set up SQL Server configuration settings.
- Export or Import data from other data sources like flat files using Import/Export of DTS.
- Back up, package and distribute databases more efficiently by using Red gate
- Automate common tasks and use functionality in applications by using Red gate
- Rebuilding the indexes at regular intervals for better performance
- Designed and implemented comprehensive Backup plan and disaster recovery strategies
- Involved in trouble shooting and fine-tuning of databases for its performance and concurrency.
- Expertise and Interest include Administration, Database Design, Performance Analysis, and Production Support for Large (VLDB) and Complex Databases
- Monitored and modified Performance using execution plans and Index tuning.
- Manage the clustered environment.
- Using log shipping for synchronization of database.
- Implementation of SQL Logins, Roles and Authentication Modes as a part of Security Policies for various categories of User Support.
- Monitoring SQL server performance using profiler to find performance and deadlocks.
- Maintaining the database consistency with DBCC at regular intervals