Big Data Engineer Resume
Indianapolis, IN
SUMMARY
- Around 9+ years of IT experience, including 3 years of experience in BigData Hadoop dealing with Apache Hadoop Ecosystem like HDFS, MapReduce, Hive 2.1.1, Sqoop 1.99.7, Oozie 3.1.3, HBase 1.3.0, Spark 2.1.0, Scala 2.12.0, Nifi 1.1.2, Storm 1.0.2 andBigDataAnalytics and 3 years of experience in Development and Support of MS SQL Server, Core Java, JSP, Servlets, JavaScript, XML and JQuery.
- Excellent understanding of Hadoop Architecture and underlying Hadoop framework including Storage Management.
- Hands on experience in installing, configuring, and using Hadoop components like Hadoop Map Reduce, HDFS, HBase 1.3.0, Hive 2.1.1, Sqoop 1.99.7 and Flume 1.7.0.
- Manageddatacoming from various sources and involved in HDFS maintenance and loading of structured and unstructureddata.
- Involved in creating POCs to ingest and process streamingdatausing Spark 2.1.0 and HDFS.
- Experience in analyzingdatausing HiveQL 2.1.1 and custom MapReduce programs in Java.4
- Worked on back - end using Scala 2.12.0 and Spark 2.1.0 to perform several aggregation logics.
- Exposed in working with SparkDataFrames and optimized the SLA’s.
- Experience in importing and exportingdatausing Sqoop 1.99.7 from HDFS to RDBMS and vice-versa.
- Experience in working with NoSQL databases like Impala 2.7.0, Hive 2.1.1, HBase 1.3.0 and Cassandra 3.10.
- Experience withDataWarehouse Netezza 7.2.x and have worked extensively on PL/SQL 11g.
- Hands on experience in Linux Shell Scripting. Worked withBigDatadistributions Cloudera.
- Good understanding in Machine Learning and statistical analysis with Matlab r2016a.
- Expert in writing complicated SQL Queries and database analysis for satisfactory performance
- Experience in Web application development using J2EE 7, HTML5, CSS3, Bootstrap, JavaScript, JSON, JQuery, AJAX, Spring MVC 4.x and Apache CXF 3.1.10.
- Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), Multithreading in Core Java, J2EE 7, Web Services (REST, SOAP), JDBC, Java Script and JQuery.
- Good understanding of service-oriented architecture (SOA) and web services like XML and SOAP.
- Experience in object-oriented analysis and design (OOAD), used modelling language (UML) and design patterns.
- Experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Expert in Building, Deploying and Maintaining the Applications.
- Experienced in preparing and executing Unit Test Plan and Unit Test Cases after software development.
- Excellent analytical, Interpersonal and Communication skills, fast learner, hardworking and good team player.
TECHNICAL SKILLS
Databases: Oracle 12c, 11g, 10g, 9i
Oracle Products: Golden gate, Oracle (Grid Control), OEM ( OracleEnterprise Manager), RMAN, SQL*Loader, DATA PUMP, ASM, RAC.
Programming Languages: SQL, PL/SQL, C, C++
Operating systems: OEL, RHEL, AIX, Solaris.
Other tools: TOAD, Putty
PROFESSIONAL EXPERIENCE
Confidential, Indianapolis, IN
Big Data Engineer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop components.
- Solid Understanding of Hadoop HDFS Map-Reduce and other Eco-System Projects.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Knowledge of architecture and functionality of NOSQL DB like HBase.
- Used S3 for data storage, responsible for handling massive amounts of data.
- Used EMR for data pre-analysis by creating EC2 instances.
- Used Kafka for obtaining the near real time data.
- Pleasant experience in writing data ingesters likes Sqoop.
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/ Confidential .
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API, Data Frames and Pair RDD's for faster processing of data and created RDD's, Data Frames and datasets.
- Batch-processing is done by using Spark implemented by Scala.
- Extensive data validation using HIVE and written Hive UDFs.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Used Cloudera data platform for deploying Hadoop in some modules.
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way lots of scripting (python and shell) to provision and spin up virtualized Hadoop clusters.
- Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming.
- Created external tables pointing to HBase to access table with vast number of columns.
- Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
- Configured TALEND ETL tool for some data filtering,
- Processed the data in HBase using Apache Crunch pipelines, a map-reduce programming model which is efficient for processing AVRO data formats.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
Environment: UNIX, Linux Java, Apache HDFS Map Reduce, Spark, Pig, Hive, HBase, Kafka, Sqoop, NOSQL, AWS (S3 buckets), EMR cluster, SOLR.
Confidential, New York, NY
BigDataEngineer
Responsibilities:
- Collaborated with the Internal/Client BA’s in understanding the requirement and architect a data flow system.
- Developed complete end to end Big - data processing in hadoop eco system.
- Used Spark API 2.0.0 over Cloudera to perform analytics on data in Impala 2.7.0.
- Implemented extensive Impala 2.7.0 queries and creating views for adhoc and business processing.
- Optimized Hive 2.0.0 scripts to use HDFS efficiently by using various compression mechanisms.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Developed Scala scripts, UDF’s using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Created hive schemas using performance techniques like partitioning and bucketing.
- Developed Oozie 3.1.0 workflow jobs to execute hive 2.0.0, sqoop 1.4.6 and map-reduce actions.
- Used SFTP to transfer and receive the files from various upstream and downstream systems.
- Written extensive Hive queries to do transformations on the data to be used by downstream models.
- Worked in exporting data from Hive 2.0.0 tables into Netezza 7.2.x database.
- Involved in complete end to end code deployment process in Production.
- Worked on all the CDH upgrades 5.8.3 and did the regression testing.
- Involved in some product business and functional requirement through gathering team, update the user comments in JIRA 6.4 and documentation in Confluence.
Environment: CDH 5.8.3, HDFS 2.7.3, SPARK 2.0.0, Hive 2.0.0, Impala 2.7.0, Sqoop 1.4.6, Map Reduce, Oozie 3.1.0, Putty, Netezza 7.2.x, YARN, Agile Methodology, JIRA 6.4
Confidential, Memphis, TN
BigDataAnalyst
Responsibilities:
- Bulk importing of data from various data sources into Hadoop 2.5.2 and transform data in flexible ways by using Apache Nifi 0.2.1, Kafka 2.0.x, Flume 1.6.0 and Storm 0.9.x.
- Developed Map reduce program to extract and transform the data sets and resultant dataset were loaded to Cassandra and vice versa using kafka 2.0.x.
- Used Spark API 1.4.x over Cloudera Hadoop YARN 2.5.2 to perform analytics on data in Hive.
- Exploring with the Spark 1.4.x, improving the performance and optimization of the existing algorithms in Hadoop 2.5.2 using Spark Context, SparkSQL, Data Frames.
- Implemented Batch processing of data sources using Apache Spark 1.4.x.
- Developed analytical components using Spark 1.4.x, Scala 2.10.x and Spark Stream.
- Imported data from various sources like HDFS/Hbase 0.94.27 to Spark RDD.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Developed numerous MapReduce jobs in Scala 2.10.x for Data Cleansing and Analyzing Data in Impala 2.1.0.
- Loaded and extracted the data using Sqoop 1.4.6 from Confidential 12.1.0.1 into HDFS.
- Analyzed large data sets on Hadoop by using Impala 2.1.0 to study issues of delivery trucks.
- Worked on implementation and maintenance of Cloudera Hadoop cluster.
- Produced a quality Technical documentation for operating, complex configuration management, architecture changes and maintaining HADOOP Clusters.
- Used Jira 6.4 for project tracking, Bug tracking and Project Management.
- Involved in Scrum calls, Grooming and Demo meeting
Environment: Hadoop 2.5.2, HDFS, Spark 1.4.x, MapReduce, Impala 2.1.0, Sqoop 1.4.6, Nifi 0.2.1, Kafka 2.0.x, Flume 1.6.0, Storm 0.9.x, HBase 0.94.27, Scala 2.10.x, Cloudera CDH 4.7.1, Confidential 12.1.0.1, Scrum, JIRA 6.4.
Confidential -Irving, TX
BigData Engineer
Responsibilities:
- Installed and configured Apache Hadoop Hive and Pig environment on the prototype server.
- Configured SQL database to store Hive metadata.
- Loaded unstructured data into Hadoop File System (HDFS).
- Created ETL jobs to load Twitter JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
- Created reports and dashboards using structured and unstructured data.
- PerformedSQL tuning Advisor for tuning of SQL queriesand database systems.
- CreatedUser accounts, Rolesand granting requiredaccess permissions and privilegesto the users.
- Partitioning and re-organization of table and indexes.
- Responsible inconfiguring and backing up database using RMAN, Hot backups, Cold backups and Logical backups.
- Written documentation for backups and cloning of database for future reference.
- Migrated databases from HP-UX to IBM AIX platforms.
- Continuously monitored database performance during batch jobs running on the database using AWR report and OEM.
- UsedData pumpto take the logical backups of databases.
- Worked on ASM instances for 11g databases.
- Installed and configured 12c software and database.
- Worked on pluggable and container 12c database model.
- Performed other DBA activities including space management and performance monitoring.
Environment: HP-UX, Big Data Database Version 10g/11g, RMAN, Data guard, RAC, ASM, Confidential Enterprise Manager Grid Control 12c, Opatch, Shell Scripting.
Confidential, California
Oracle DBA/Apps DBA
Responsibilities:
- Installed 11g software and created a database on Redhat Linux for R12 testing.
- Performed Confidential Database upgrades from10.2.0.2 to 10.2.0.3in AIXRACenvironment.
- Upgraded databases from 10.2.0.2 to 10.2.0.5 and 10.2.0.5 to 11.1.0.6
- Applied different patches - one-off patches, CPU and PSU patches: 11.2.0.1.1 , 11.2.0.1.2
- Performed pre-upgrade and post-upgrade tasks for upgrade process of databases.
- Performed RMAN/Offline /Online backups and restored the backups to refresh the environment.
- Setup and configured 2 node Confidential 10g RAC with ASM for high availability of database at instance level.
- Used 10g OEM ( Confidential Enterprise Manager) to manage and monitor databases for maintenance tasks.
- Cloned schemas, objects and data on new server using exports from 9i database and imported in 10g using Confidential data pump.
- Setup and configured Confidential Physical Standby database (Data Guard) for disaster recovery as well as to offload the running of huge reports.
- Worked on operating system command line utilities likeSAR, TOP, VMSTAT.
- Responsible for setting and managing user accounts, granting required privileges and roles to users.
- Worked on Managing Applications on Confidential 11g / 10g RAC databases using ASM file system with server control utility (SRVCTL) and CRSCTL for high availability.
- ConfiguringDB Linksfor users located remotely to access data locally using database link.
- Management of schemas, objects, the tables & indexes.
- Created partitionsto improve the queryperformance and manageability.
- REFRESH/CLONING of Databases and applications for development, testing and Prod boxes on demand from Application teams using EXPORT/IMPORT, DATAPUMP at table level, schema level and database level for Refreshes.
- Planned and implemented high availability solutions usingOracle 10g RAC, Standby Database using OracleData Guard
- Applied CPU security patches and PSU in both RAC and standalone systems in ASM environment.
- SQL Query performance tuning for optimized results using tools like Explain Plan, SQL TRACE, TKPROF, SQL Profiler
- Used shell scripts for daily administration.
- Closely monitored and administrated log transport services and log apply services in Physical/Logical databases.
- Daily database administration tasks including backup and recovery through RMAN, Import/Export.
- Performed data transfer using Export/Import using command line as well as in interactive mode and through Data pump.
- Installed and configured of Confidential 10g &11gRAC using Cluster ware and ASM.
- Worked on implementation and generating documentation of Confidential 11g Data Guard on both physical/logical standby databases in high availability mode.
- Automated the nightly RMAN backups for the Production and Test Databases in Crontab.
- Recovered the database restore and recovery from RAMN backup.
- Worked on Database Security User Management, Privileges, Roles, Auditing, Profiling, Authentication.
- Applied security patches both Server side and Client side and Performed testing.
- Worked on documentation, provided documentation for database design, analysis, modifications, creating schemas etc.
Environment: Redhat Linux, HP-UX, Confidential Database Version 10g/11g, RMAN, Data guard, RAC, ASM, Confidential Enterprise Manager (10g), Opatch, Shell Scripting, TOAD, TKProf, Datapump.
Confidential, Gilbert, AZ
Oracle DBA
Responsibilities:
- Monitored Smiths medical client production and development databases. Checked the Blocking sessions, if the blocking session is inactive then clearing those sessions, so that other sessions waiting can continue.
- Cloned and refreshed the databases as per requirements of the development team.
- Created, altered the tables, views and PLSQL objects as per the development team requirement.
- Used 10g OEM ( Confidential Enterprise Manager) to manage and monitor databases.
- Applied one off, security patches to fix the bugs.
- Collected the statistics of the database and tuning the resource consuming SQL statements.
- Applied patches on databases and 11i Confidential EBS applications.
- Created custom tops, custom responsibilities in Confidential applications.
- Used exp/imp to refresh schemas and performed RMAN/cold/hot backups.
- Automated and implement day-to-day Confidential DBA functions using cron jobs and shell scripts.
- Used SQL*Loader to move data from flat files into an Confidential database.
- Used traditional export import to take backup of schemas whenever required as per the request.
- Gathered database statistics and rebuilt indexes if required for better sql query execution.
- Created Users, Groups, Roles, Profiles and assigned users to groups and granted privileges and permissions to appropriate groups as per the requests.
- Performed daily DBA support activities for the Confidential production Systems.
- Performed and tested Confidential backup and recovery policies and procedures.
- Participated directly in testing and planning to prepare for database 10g upgrades.
- Worked on Confidential Apps 11i/R12 E-Business suite systems also.
- Installation, configuration and upgrade.
- Cloning and Autoconfig.
- Patching (ADPATCH).
- Concurrent managers.
- User management.
- 24x7 Production database administration, support and monitoring to ensure a proactive problem recognition and resolution of database issues.
- Installation of Confidential Database and Confidential products.
- Creating users and maintaining database security.
- Developed/customized UNIX shell scripts to automate routine DBA tasks.
- Monitored the various database activities such as backup, error logs, space, object, performance, user and session monitoring to ensure proper capacity and availability
- Created and implemented Confidential database backup and recovery strategies.
- Performed Database and SQL tuning by using various Tools like STATSPACK, TKPROF, explain plan optimizer hints and SQL tuning advisor.
- Installed Confidential 10g, applied patches in various environments and upgraded databases from 9i to 10g.
- Set up, configured and maintained Confidential data guard (physical/logical standby databases) to ensure disaster recovery, high availability and data protection.
- Created procedures, functions, packages, and triggers using PL/SQL.
- Utilized SQL*loader and data pump import to load data into tables.
- Audited database for statements, privileges, and schema objects.
- Database installation and instance creation.
Environment: Confidential 9i, HP-UX, Solaris, RHEL, AIX, Database Tools/Utilities, Shell Scripting, export/Import.