Big Data Lead Resume
Minneapolis, MN
SUMMARY
- Over 7+ years’ experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Around 5 years of professional experience including extensive Hadoop and Linux experience.
- Experienced in installation, configuration, supporting and monitoring 100+ node Hadoop cluster using Cloudera manager and Hortonworks distributions.
- Experience in performing various major and minor Hadoop upgraded on large environments.
- As an admin involved in Cluster maintenance, trouble shooting, Monitoring and followed proper backup & Recovery strategies.
- Experience in HDFS data storage and support for running map - reduce jobs.
- Involved in Infrastructure set up and installation of HDP stack on Amazon Cloud.
- Experience with ingesting data from RDBMS sources like - Oracle, SQL and Teradata into HDFS using Sqoop.
- Experience in big data technologies: Hadoop HDFS, Map-reduce, Pig, Hive, Oozie, Sqoop, Zookeeper and NoSQL.
- Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured and unstructured data.
- Loaded the aggregate data into a relational database for reporting, dash boarding and ad-hoc analyses, which revealed ways to lower operating costs and offset the rising cost of programming.
- Use innovation to improve operational processes and performance, making sure data is of the highest Quality, build and unit test integration components
- Formulation of highly detailed DW solutions using Informatica tool set which can be practically implemented
- Develop and coding of Informatica mappings, session, workflows for different stages of ETL
- Experience in benchmarking, performing backup and disaster recovery of Name Nodemetadata and important sensitive data residing on cluster.
- Experience in designing and implementing HDFS access controls, directory and file permissions user authorization that facilitates stable, secure access for multiple users in a large multi-tenant cluster
- Experience in using Ambari for Installation and management of Hadoop clusters. experience in Ansible and related tools for configuration management.
- Experience in working large environments and leading the infrastructure support and operations.
- Migrating applications from existing systems like MySQL, oracle, db2 and Teradata to Hadoop.
- Expertise with Hadoop, Mapreduces, Pig, Sqoop, Oozie, and Hive.
- Benchmarking Hadoop clusters to validate the hardware before and after installation to tweak the configurations to obtain better performance.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster.
- Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
- Adequate knowledge and working experience in Agile & Waterfall methodologies.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Cassandra, Power pivot, Puppet, Oozie, Zookeeper, Kafka, Spark, Unix
Big data Analytics: Data Meer 2.0.5
Frameworks: MVC, Struts, Hibernate, Spring
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL
PROFESSIONAL EXPERIENCE
Confidential
Big Data Lead
Responsibilities:
- Responsible for running the day to day operations of the technology platform.
- Work activities specific to Production Services roles include Problem/Incident Management, Deployment, Operational Readiness, Capacity/Availability Management, Application Monitoring, Reporting, Production Governance, Triage, Associate Support, Change/Configuration Management.
- I was responsible to identify possible production failure scenarios, create incident tickets in ticket tracking system, and communicates effectively with development and internal business operations teams.
- Identifies vulnerabilities and opportunities for improvement, as well as maintain metrics to help develop analysis that will drive improvement in all areas of Production Services.
- Creates and enhances administrative, operational and technical policies and procedures, adopting best practice guidelines, standards and procedures.
- Takes ownership of escalations and perform trouble shooting, analysis, research and resolution using advanced query and programming skills.
- Performs analytical, technical, and administrative work in planning, installing, designing and supporting new and existing equipment and software under moderate supervision.
- Resolving complex issues which comes across.
- Consults with end users to determine optimal configuration of equipment and applications.
- Works on problems of minimal-moderate scope where analysis of situation or data requires a review of identifiable factors.
- Exercises judgment within defined procedures and practices to determine appropriate action and document it as per needed for future.
- This increased awareness and exposure to basic technical principles, concepts and techniques.
- Coaching and mentoring for new on-boarding employees.
- Initiates and provides leadership, strategic/tactical direction and planning input on all information technology and client/business area issues and in the development of technology environment which meets current and anticipated business requirements and objectives.
- Participates with management in the development of technology products, service standards and development efforts that impact the client/business area.
- Serves as an escalation point between the client/business area and internal management for the resolution of moderately complex unresolved problems, complaints and service requests.
- Provides the client areas with technology products and service alternatives that improve the production services environment.
Environment: HDFS, Map Reduce, Spark, Kafka, Hive, Pig, Unix, Scoop, Ranger,Hbase,Jenkins.
Confidential - Minneapolis, MN
Hadoop / Data Platform Architect
Responsibilities:
- Designed and implemented end to end big data platform solution on. Teradata Appliance and AWS cloud.
- Manage Hadoop clusters in production, development, Disaster Recovery environments.
- Implemented Teradata Aster a data science tool and integrate with Hadoop.
- Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
- Handle data exchange between HDFS and RDBMS. Write Spark applications in Scala to interact with MYSQL database using Spark SQL
- Experienced in working with Spark eco system using SCALA and HIVE Queries on different data formats like Text file and parquet.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Involved with the team of fetching live stream data from DB2 to HDFS table using Spark Streaming.
- Integrate Informatica BDM and Informatica Cloud with Hadoop.
- Worked on the conversion of existing MapReduce batch applications to Spark for better performance.
- Implemented Confidential Guardium to perform enterprise level monitoring.
- Splunk integration with Hadoop for log aggregation and monitoring dashboards.
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Ranger, Rangerkms, Falcon, Smart sense, Storm, Kafka.
- Recovering from node failures and troubleshooting common Hadoop cluster issues.
- Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Automated Hadoop deployment using Ambari blueprints and Ambari REST API's.
- Responsible for building a cluster on HDP 2.5.
- Performed major Hadoop upgrades. Upgraded from HDP 2.5.3 to HDP 2.6.4
- Worked closely with developers to investigate problems and make changes to the Hadoop environment and associated applications.
- Trouble shooting many cloud related issues such as Data Node down, Network failure, login issues and data block missing.
- Proven results-oriented person with a focus on delivery.
- Performed Importing and exporting data into HDFS and Hive using Scoop.
- Performed HDFS cluster support and maintenance tasks like Adding and Removing Nodes without any effect to running jobs and data.
- Used Python programming and language to develop a working and efficient network within the company
- Utilized Python in the handling of all hits on Django Redis and other applications
- Performed research regarding Python Programming and its uses and efficiency
- Developed object-oriented programming to enhance company product management
Environment: HDFS, Map Reduce, Spark, Kafka, Hive, Pig, Unix, Scoop, Ranger, Rangerkms, Falcon, Smart sense, Storm, Kafka.
Confidential - Sunnyvale, CA
Hadoop Architect
Responsibilities:
- Involving in Analysis, Design, Implementation and Bug Fixing Activities.
- Involving in Functional & Technical Specification documents review.
- Created and configured domains in production, development and testing environments using configuration wizard.
- Developed Spark applications using SCALA with SPARK-SQL/STREAMINGAPI for faster testing and processing of data.
- Worked on the SPARK SQL and Spark Streaming modules of SPARK and used SCALA to write code for all Spark use cases.
- Worked on converting PL/SQL code into Scala code and converted PL/SQL queries into Hive queries.
- Involved in creating and configuring the clusters in production environment and deploying the applications on clusters.
- Deployed and tested the application using Tomcat web server.
- Analysis of the specifications provided by the clients.
- Involved to Design of the Application.
- Ability to understand Functional Requirements and Design Documents.
- Developed Use Case Diagrams, Class Diagrams, Sequence Diagram, Data Flow Diagram
- Coordinated with other functional consultants.
- Web related development with JSP, AJAX, HTML, XML, XSLT, and CSS.
- Create and enhance the stored procedures, PL/SQL, SQL for Oracle 9i RDBMS.
- Designed and implemented a generic parser framework using SAX parser to parse XML documents which stores SQL.
- Creating Hive tables and working on them using Hive QL.
- Written Hive queries for data analysis to meet the business requirements.
- Experienced in defining job flows.
- Got good experience with NOSQL database like HBase.
- Identified the required data to be pooled to Hadoop, and created required Sqoop scripts which were scheduled periodically to migrate data to Hadoop environment.
- Provided further Maintenance and support, this involves working with the Client and solving their problems which include major Bug fixing.
Environment: Java 1.4, Web logic Server 9.0,Kafka, Oracle 10g, Web services Monitoring, Web Drive, UNIX/LINUX Hadoop, Hive, Web Logic Server, JavaScript, HTML, CSS, XM
Confidential
Hadoop Architect
Responsibilities:
- Responsible for coding Map Reduce program, Hive queries, testing and debugging the Map Reduce programs.
- Developed Pig Latin scripts in the areas where extensive coding needs to be reduced to analyze large data sets.
- Migrate Data from Elasticsearch-1.4.3 Cluster to Elasticsearch-5.6.4 using log stash, Kafka for all environments.
- Infrastructure design for the ELK Clusters.
- Developed Spark code using Scala and Spark -SQL for faster testing and data processing.
- Involved in converting Hive/SQL queries into Spark Transformations using SPARK RDDs and SCALA.
- Worked on the SPARK SQL and Spark Streaming modules of SPARK and used SCALA to write code for all Spark use cases.
- Developed Spark code using Scala and Spark -SQL for faster testing and data processing.
- ElasticSearch and Log stash performance and configuration tuning.
- Identify and remedy any indexing issues, crawl errors, SEO penalties, etc.
- Provided design recommendations and thought leadership to improved review processes and resolved technical problems.
- Benchmark Elasticsearch-5.6.4 for the required scenarios.
- Using X-pack for monitoring, Security on Elasticsearch-5.6.4 cluster.
- Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
- Providing Global Search with ElasticSearch
- Implemented Hadoop cluster on Horton Work’s HDP 2.4.and assisted with performance tuning, monitoring and troubleshooting.
- Used Sqoop tool to extract data from a relational database into Hadoop.
- Worked closely with data warehouse architect and business intelligence analyst to develop solutions.
- Responsible for performing peer code reviews, troubleshooting issues and maintaining status report.
- Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
- Installed and configured Hadoop cluster in DEV, QA and Production environments.
- Performed upgrade to the existing Hadoop clusters.
- Enabled Kerberos for Hadoop cluster Authentication and integrate with Active Directory for managing users and application groups.
- Implemented Commissioning and Decommissioning of new nodes to existing cluster
- Worked with systems engineering team for planning new Hadoop environment deployments, expansion of existing Hadoop clusters.
- Responsible for data ingestions using Talend.
- Designed and presented plan for POC on impala.
- Experienced in migrating Hive QL into Impala to minimize query response time.
- Monitoring workload, job performance and capacity planning using Cloudera Manager.
- Worked with application teams to install OS level updates, patches and version upgrades required for Hadoop cluster environments.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Environment: Hadoop, HDFS, Map Reduce, Hive, Flume, Sqoop, Hortonworks, Cloudera CDH4, HBase, Oozie, Pig, AWS EC2 cloud, Eclipse, Talend, ELK.