Sr. Bigdata Engineer/developeer Resume
Chicago, IL
SUMMARY:
- Having around 9 Years of strong professional IT experience that includes 3 Years as a Hadoop and Spark developer.
- Highly experienced in data management, data modeling, data warehouse and Administration of database systems
- Involved in analysis, design and development using Hadoop ecosystem components and performing Data Ingestion, Data modeling, Data Profiling, Querying, Processing, Data Integration.
- Expertise in Hadoop Eco System components HDFS, Map Reduce, Yarn, HBase, Sqoop, Flume, Hive and Spark for scalability, distributed computing and high - performance computing.
- Good working experience on Spark Framework and optimization/performance tuning of Spark Jobs & Hive Queries and good knowledge job orchestration with scheduler tool
- Developed Spark jobs to automate transfer the data from RDBMS platforms
- Good knowledge of data architecture including data ingestion pipeline design, Hadoop architecture, data modeling and data mining and advanced data processing
- Excellent understanding of Spark Architecture and framework, Spark Session, PySpark, RDDs, Spark SQL, Data frames, Streaming, integration with other data sources as well.
- Knowledgeable with developing and implementing Spark programs in Python using Hadoop to work with Structured and Semi-structured data.
- Utilized Spark SQL for reporting, data processing and integration with Hive and RDBMS
- Adequate understanding of Hadoop Gen 1 /Gen 2 architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Secondary Name Node, Data Node and YARN architecture and its demons, Node manager, Resource manager and App Master and Spark Programming Paradigm.
- Hands on experience in using the Hue browser for interacting with Hadoop components.
- Good understanding and Experience with Agile and Waterfall methodologies of Software Development Life Cycle (SDLC).
- Developed end to end Spark applications using Scala/Python to perform various data cleansing, validation, transformation and summarization activities as per the necessities.
- Experienced in developing UDFs for Hive, Spark using Scala/Python based on the requirements and use cases.
- Broad experience working with structured data using Spark SQL, Data frames, Hive QL, optimizing queries.
- Experience working with Text, Sequence files, JSON, ORC and Parquet file formats
- Hands on experience in planning project activities, estimations, tracking, change management, financial management, resource management.
- Highly motivated, self-learner with a positive attitude, willingness to learn new concepts and accepts challenges.
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop, HDFS, Hive, Apache Spark, Spark SQL, Spark Streaming Map Reduce, Sqoop, Hbase, Zookeeper, Apache Kafka
Databases: MySQL, Hbase, Oracle, SQL Server
AWS Services: S3, EC2, EMR, Redshift, RDS, Lambda, SNS, SES, Snowball, Cloud Watch, CloudTrail
Programming Languages: Python, Scala, SQL, HiveQL, Confidential: SQL
Tools: IntelliJ, Eclipse, PyCharm, Putty, Tableau 8.0
Operating Systems: Linux, Windows
Scheduling Tools: Autosys
ETL tools: MS SSIS
PROFESSIONAL EXPERIENCE:
SR. BIGDATA ENGINEER/DEVELOPEER
Confidential, Chicago, IL
Responsibilities:
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs and involved in gathering and analyzing system requirements and played key role in the high-level design for the implementation of this application.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Managed and reviewed Hadoop log files to identify issues when job fails and used HUE for UI based pig script execution, oozie scheduling.
- Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers and involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to the Data Lake.
- Developed SQOOP scripts to ingest data from RDBMS sources and implement data cleansing and processing using PySpark and Scala
- Developed Spark code to mimic the transformations performed in the on-premise environment and analyzed the Sql scripts and designed solutions to implement using PySpark.
- Automated workflows using shell scripts pull data from various databases into Hadoop and developed scripts to automate the process and generate reports.
- Developed UNIX scripts in creating Batch load for bringing huge amount of data from relational databases to BIGDATA platform.
- Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket, as part of a AWS POC
- Leveraged Hive queries to create ORC tables and developed HIVE scripts for analyst requirements for analysis.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and parsed high-level design spec to simple ETL coding and mapping standards.
- Use Spark API for Machine learning. Translate a predictive model from SAS code to Spark and used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop.
- Implemented the Business Rules in Spark/ SCALA to get the business logic in place to run the Rating Engine and developed code from scratch in Spark using SCALA according to the technical requirements.
- Used Spark UI to observe the running of a submitted Spark Job Confidential the node level and used Spark to do Property Bag Parsing of the data to get the required fields of data.
- Extensively used ETL methodology for supporting Data Extraction, transformations and loading processing, using Hadoop.
- Used spark-submit to execute spark jobs and manage data processing pipeline on Hadoop
Environment: Hadoop, Hive, HDFS, Sqoop, Python, SparkSQL, AWS, AWS S3, AWS EMR, Oozie, ETL, Tableau, Spark, Cloudera Distribution, Java, Impala, Agile-Scrum.
Confidential, Dallas, TX
Big Data/Hadoop Developer
Responsibilities:
- Used Spark for report queries, processing of batch data and integration with popular NoSQL database for huge volume of data.
- Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables and accumulators
- Worked on numerous file formats like Text, Sequence files, Avro, Parquet, ORC, JSON, XML files and Flat files using Spark Programs.
- Expanded daily process to do incremental import of Teradata into Hive tables using Sqoop.
- Resolved performance issues in Hive and Spark scripts with analyzing Joins, Group and Aggregation and how it translates to MR jobs.
- Stock the data into Spark RDD and Perform in-memory data computation to generate the output exact to the requirements.
- Involved in scripting Spark applications using Python to perform various data cleansing, validation, transformation and summarization activities as to the requirements .
- Developed data pipelines using Spark, Hive and Sqoop to ingest, transform and analyze operational data.
- Developed Spark jobs, Hive jobs to encapsulate and transform data.
- Fine- Tune Spark application to improve performance.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Exten sively worked with Partitions, Dynamic Partitioning, bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
- Involved in collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Assisted analytics team by writing Spark and Hive scripts to perform further detailed analysis of the data.
Environment: Cloudera, Scala 2.10.5, Apache Spark, Oracle, CDH 5.8.2, Hive, HDFS, YARN, Spark, Sqoop 1.4.3, Flume, Unix Shell Scripting, Python 2.6, Apache Kafka
Confidential, Dallas, TX
Consultant
Responsibilities:
- Oracle Fusion Middleware & Database Administration experience in variety of different environments like Windows (NT/2005 Server, and different flavors of Unix (IBM-AIX, Linux)
- Install, upgrade and maintain Oracle 10g,11g,12C on Linux, Solaris and Windows.
- Write and modify UNIX shell scripts to manage Oracle environment
- Administrating Oracle SOA suite in WebLogic clusters, OSB, BPEL administration, High availability / Disaster Recovery.
- Worked on server virtualization including Oracle VM & VMWARE.
- Configured JMS Components including Server, Module, Topics, queue and sub-deployments
- Working Exclusively on 11.1.0.7/11 gR2 RAC Environments.
- Responsible for Installation and Configuration of Oracle Fusion Middleware 11g products on Red Hat Enterprise Linux 5 Update 3, 64-bit platform.
- Developed silent installation scripts for installation of fusion middleware products as per client requirement.
- Installed Weblogic Server 10.3.4 application server container.
- Installed and configured domains for Oracle SOA, Oracle AIA, Oracle Service Bus, Oracle WebCenter, Oracle Enterprise Repository, Oracle Service Registry, Oracle Data Integrator,
- Oracle Enterprise Content Management, and Oracle Complex Event processing both on standalone and clustered environments.
- Installed and configured Web Tier 11.1.1.4 with Oracle HTTP server instances for load balancing.
- Troubleshooting various issues aroused during different stages of installation and configuration of the domains. And rectifying the issues using installation log files, coordination with Solution Architect team
Confidential, Salt Lake City, UT
Consultant
Responsibilities:
- Implemented Performance tuning for different Oracle middleware products including SOA Suite & WebLogic.
- Worked on WebLogic application servers including installation, Creation of WebLogic Domains and configurations.
- Responsible for testing/validating the setup of software/hardware
- Done Troubleshooting/Performance Tuning OracleEM12c, FMW and Fusion Apps production Deployments.
- Installed& Configured Oracle WebCenter Suite 11g on Linux environments.
- Work involving Oracle FMW products (SOA Suite, OSB, ODI, Web Center, and WebLogic).
- Implemented Performance tuning for different Oracle middleware products including SOA Suite &WebLogic.
- Upgraded Oracle SOA Suite from 11.1.1.5 to 11.1.1.7 for all environments.
- Performance Tuning of different Oracle products including Fusion middleware (SOA, WebCenter Portal & Content).
- Proficient in deployments on Weblogic, OSB, SOA Suite, OAG using out of box functionalities or scripts.
- Monitoring the environment using OEM and setting up alerts as and when required.
- Implemented Oracle OIM 11g performance tuning parameters.
- Installed WebLogic/SOA on Linux environment.
- Configure and migrate Operating System, Middleware Applications and Database. Installation & Configuration of Oracle SOA 11g
- Worked on Cluster configuration of Oracle SOA 11g and OSB 11g for extending the domains
Confidential, Franklin, TN
Consultant
Responsibilities:
- Design Complex Infrastructure solutions for multiple clients using Oracle Product Stack (Linux/Solaris, Database, and Fusion Middleware).
- Experience with WebLogic Integration administrator and configuration using application integration platforms.
- Work involving Oracle FMW products (SOA Suite, OSB, ODI, Web Center, and WebLogic).
- Installed and configured different Oracle Fusion Middleware products.
- Implemented backup procedures for different clients using Oracle fusion middleware.
- Implemented Performance tuning for different Oracle middleware products including SOA Suite & WebLogic
- Worked on upgrading the environment from WebLogic 11g and 12c
- Install & Configure Oracle Web enter Suite 11g on Linux environnement.
- Upgraded Oracle SOA Suite from 11.1.1.5 to 11.1.1.7 for all environments.
- Configure and migrate Operating System, Middleware Applications and Database. Installation & Configuration of Oracle SOA 11g.
- Tuning and configuring Oracle SOA Suite 11g environment for high availability and load balance.
- Managing and monitoring service engines and updating the state of SOA composite applications.
- Experience with WebLogic Integration administrator and configuration using application integration platforms.
- Configured software components for managed WebLogic services on Server Farms and Cloud Services.
- Worked on upgrading the environment from WebLogic 11g and 12c.
- Installed and configured different Oracle Fusion Middleware products
Confidential, Grand Rapids, MI
Oracle DBA
Responsibilities:
- Supporting multiple databases for production, development, test and staging purposes on Sun Solaris and Windows environments.
- Applying upgrade patch, maintenance and interim (Opatch) patches on all the databases.
- Refreshing Dev and Test instances with data from Production on a regular basis.
- Checking Backup and Restore validity periodically on databases about 12TB in size, and Data refreshes from Production to Non-Production environment, Creation Duplicate Databases using RMAN Backups
- Worked on USER MANAGEMENT, SPACE MANAGEMENT Granting required privileges to users.
- Upgrade and Migration of Databases from 10g R2 to 11g R1 and applying patches as per requirement.
- Worked on refreshing the tables, schemas using exp & imp and also using expdp & impdp.
- Refreshed the databases from PROD to UAT, QA, and DEV using RMAN.
- Performed migration of databases from 9i and 10g to 11g on Linux.
- Optimization and tuning SQL queries using TRACE FILE, TKPROF, EXPLAIN PLAN.
- Decent usage of ADDM, AWR, ASSM, features of 10g/11gR2.
- Worked on resolving the gap between primary and standby
- Creating OEM alerts that provides notifications through emails.
- Monitoring rollback segment and temporary table space use.
- Performing housekeeping tasks like checking log and trace files.
Confidential, Plano, TX
Network Administrator
Responsibilities:
- Part of the team which was responsible for the overall management of the LAN/WAN.
- Installed, configured and administered network devices such as routers, switches and Hubs and performed LAN/WAN routine maintenance and troubleshooting.
- Overall responsible for reliable and consistent LAN/WAN network performance and Services.
- Implemented and administered Virtual Local Area Networks.
- Administered Remote Access Server and defined its network security policies and User accounts.
- Backup and Recovery as per the defined backup Strategies.
- Managed the planning and implementation of information systems security, anti-virus and data protection software.
- Designed desktop account security parameters on one workstation which served as template for all
- Terminals. Ensured business continuity with full responsibility for 100% uptime, full data backup, and recovery and system redundancy.
- Monitoring of the system performance and corrective action for smooth functioning of system.
- Installation and configuration of Windows NT 4.0 and MS-Proxy Server.
- Responsible for troubleshooting network performance issues.
- Responsible for the creation and maintenance of a disaster recovery plan.
- Install and perform minor repairs to hardware, software, or peripheral equipment, following design or installation specifications.
- Responsible for maintaining an accurate inventory of technology hardware, software, and resources.
- Modify existing software to correct errors, to adapt it to new hardware, or to upgrade interfaces and improve performance.