Aws Big Data Engineer Resume
Bethpage, NY
SUMMARY:
- Certified Big Data Engineer with 3+ years of IT Experience comprising of Amazon web Services, and AWS Big Data Engineering & Implementation, Application development; having acquired extensive experience in designing, deploying and operating highly available, scalable and fault tolerant systems using Amazon Web Services (AWS) and big data technologies to efficiently solve Big Data processing requirements.
- Demonstrable knowledge & experience in various Big Data or distributed systems such as Hadoop, ETL and Streaming.
- Developed Cloud Formation scripts to build on demand EC2 instance formation.
- Strong experience in Hadoop and Big Data eco system related files such CSV/JSON & Parquet
- Capable of extracting data from an existing database, Web sources or APIs
- Experience designing and implementing fast and efficient data acquisition using Big Data processing techniques and tools.
- Experience with running queries on Athena from s3 buckets
- Experience of database technology such SQL, PLSQL, and MySQL including NoSQL databases.
- Experience in Hadoop, MapReduce and Apache Spark
- Strong Experience using AWS Glue for ETL process
- Strong hands on experience in scripting languages such as Python, Shell, Bash scripting.
- Experience in Data Engineering (RedShift, Kinesis Data Stream)
- Knowledge of how to automate the creation, deployment and maintenance of security rules. (WAF)
- Strong exposure to AWS and Google cloud platforms.
- Strong experience in scripting languages SQL, T - SQL, JSON, XML, Java(basic), Python.
- Worked with project documentation and documented other application related issues, bugs on internal wiki website.
- Good experience of software development in Python (libraries used: NumPy, SciPy, Matplotlib, Pandas data frame, network, MySQL dB for database connectivity) and IDEs - Jupyter Notebook, PyCharm.
TECHNICAL SKILLS:
Cloud Platforms: AWS, Azure, GCP
AWS SERVCIES: VPC, Identity and Access Manager (IAM), EC2 Container Service, Elastic Beanstalk, Lambda, S3, CloudFront, Glacier, RDS, DynamoDB, ElastiCache, Redshift, DirectConnect, Route 53, CloudWatch, CloudFormation, CloudTrail, Athena, Amazon Elastic MapReduce (EMR), AWS Glue, Redshift, SQS.
Operating Systems: Ubuntu, Windows, MAC OS X, Windows Server.
Database: Oracle, SQL Server, MySQL, Redshift, Athena
Reporting & Visualization: Cognos, Tableau, Excel, Informatica, Power BI
Programming Languages: SQL, PLSQL (Basic), Python, C, Java (Basic).
Scripting & Other Tools: UNIX Shell Scripts (Bash), Git Bash, Putty.
PROFESSIONAL EXPERIENCE:
Confidential, Bethpage, NY
AWS Big Data Engineer
Responsibilities:
- Deployed Lambda and other dependencies into AWS to automate EMR Spin for Data Lake jobs
- Set up continuous integration/deployment of spark jobs to EMR clusters
- Scheduled spark applications/Steps in AWS EMR cluster.
- Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server.
- Configured database to store Hive metadata.
- Loaded unstructured data into Hadoop File System (HDFS).
- Created ETL jobs to load data and server data into Buckets and transported S3 data into the Data Warehouse.
- Created reports and dashboards using structured and unstructured data.
- Joined various tables using spark and Scala and ran analytics on top of them in EMR
- Applied spark streaming for real time data transforming.
- Created multiple dashboards in tableau for multiple business needs.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Implemented test scripts to support test-driven development and continuous integration
- Developed and executed a migration strategy to move Data Warehouse from SAP to AWS Redshift.
- Designed and built multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day.
- Implemented and Managed ETL solutions and automating operational processes.
- Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.
- Published interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.
Environment: Data Engineering, Databases, EMR, Redshift, Hadoop, Spark, ETL, Tableau.
Confidential
Data Analyst
Responsibilities:
- Created SSIS packages to extract data from OLTP to OLAP systems, and scheduled jobs to call the packages.
- Applied various data transformations like Slowly Changing Dimension (SCD), Aggregate, Sort, Multicasting, Conditional split, Derived column etc.
- Creating complex T-SQL, Stored Procedures, functions, triggers and DB objects as and when needed to convert data logically as per the business requirements.
- Data modelling, database design, development, testing and implementation with RDBMS.
- Design, develop and maintain customized reports and dashboard using excel and tableau
- Wrote Python Scripts to extract data from SQL Server database for further Analysis
- Automated weekly reports with Python Scripting
- Data migration and Data validation/Testing through VBA modules between MS Access and MS Excel
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine-learning algorithms.
- Hands-on experience and good knowledge of AWS services: EC2, EBS, S3, RDS, VPC, VPN, Route53, ELB, Auto scaling, SQS, SNS, IAM, DynamoDB, Cloud Front, Cloud Formation, ECS, CloudWatch, CloudTrail, Storage Gateway, internet Gateway, NAT Gateway.
- Automated Importing of data to Access and SQL Server from flat files and Excel spread sheets.
- Wrote python scripts to transform, data cleansing and loading the raw data into tables.
- Developed PySpark objects to load data into distributed layers in Hive for tableau dashboards.
- Developed Spark and MapReduce jobs to parse the JSON and XML data.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Created scalable, reusable and secure data pipelines using Informatica, SQL embedded bash scripts.
Environment: Data Analysis, ETL, data warehousing, Python, MSSQL, AWS, Excel